An expert in AI video generation discusses the technologyās rapid advancesāand its current limitations.
This presidential cycle has already seen several high-profile examples of people using deepfakes to try to influence voters. Deepfakes are images, audio recordings, or videos generated or modified using artificial intelligence (AI) models to depict real or fictional people. Recent deepfake examples include and .
It appears generative artificial intelligence is an increasingly prominent tool in the misinformation toolbox. Should voters be concerned about being bombarded with phony videos of politicians created with generative AI? An expert in computer vision and deep learning at the °µĶų³Ō¹Ļ says that while the technology is rapidly advancing, deepfake video generation remains harder for bad actors to leverage due to its complex nature.
While OpenAIās products, including ChatGPT for text generation and DALL-E 3 for image generation, are taking off in popularity, the company has yet to release an equivalent for video generation. According to , an associate professor of at Rochester, the company has released previews of its Sora video generation software but has yet to release the product, which is still undergoing testing and refinement.
āGenerating video using AI is still an ongoing research topic and a hard problem because itās what we call multimodal content,ā says Xu. āGenerating moving videos along with corresponding audio are difficult problems on their ownāand aligning them is even harder.ā
Xu says that his research group was among the first to use artificial neural networks to generate multimodal video in 2017. They started with tasks like . From there, they moved on to problems like , and then to .
āNow, we can generate real-time, fully drivable heads and even ,ā says Xu.

Challenges with deepfake detection technology
Xuās team has also developed technology for . He calls it an area that needs extensive further research, noting that itās easier to build technology to generate deepfakes than to detect them because of the training data needed to build the generalized deepfake detection models.
āIf you want to build a technology thatās able to detect deepfakes, you need to create a database that identifies what are fake images and what are real images,ā says Xu. āThat labeling requires an additional layer of human involvement that generation does not.ā
Another concern, he adds, is making a detector that is generalizable to different types of deepfake generators. āYou can make a model that performs well against the techniques you know about, but if someone uses a different model, your detection algorithm will have a hard time capturing that,ā he says.
The easiest targets for video deepfakes
Having access to good training data is crucial for creating effective generative AI models. As a result, Xu says politicians and celebrities will be the earliest and easiest targets when video generators become widely available.
āPoliticians and celebrities are easier to generate than normal people because there is simply more data about them,ā says Xu. āBecause so much video of them already exists, these models can use it to learn the expressions they show in different situations, along with their voices, their hair, movements, and emotions.ā
But he expects that, at least initially, the training data the āceleb deepfakesā in particular are built on may make them more easily noticeable.
āIf you used only high-quality photographs to train a model, it will produce similar results,ā says Xu. āIt may result in an overly smooth style that you can pick out as a cue to tell itās a deepfake.ā
Other cues can include how natural a personās reaction seems, whether they can move their heads, and even the number of teeth shown. But image generators have overcome similar early tellsāsuch as āand Xu says enough training data can mitigate these limitations.
He calls on the research community to invest more effort into developing deepfake detection strategies and grappling with the ethical concerns surrounding the development of these technologies.
āGenerative models are a tool that in the hands of good people can do good things, but in the hands of bad people can do bad things,ā says Xu. āThe technology itself isnāt good or bad, but we need to discuss how to prevent these powerful tools from ending up in the wrong hands and used maliciously.ā
