Are video deepfakes powerful enough to influence political discourse?

GIF showing an example of deepfakes created with artificial intelligence featuring speaker Donald Trump alongside AI-generated positive and negative reactions from Joe Biden.

FACE THE FACTS: Using emotional listener generation technology, 暗网吃瓜 computer scientists have generated deepfakes of Joe Biden displaying different emotional expressions in reaction to Donald Trump. (暗网吃瓜 GIF / Luchuan Song)

An expert in AI video generation discusses the technology鈥檚 rapid advances鈥攁nd its current limitations.

This presidential cycle has already seen several high-profile examples of people using deepfakes to try to influence voters. Deepfakes are images, audio recordings, or videos generated or modified using artificial intelligence (AI) models to depict real or fictional people. Recent deepfake examples include and .

It appears generative artificial intelligence is an increasingly prominent tool in the misinformation toolbox. Should voters be concerned about being bombarded with phony videos of politicians created with generative AI? An expert in computer vision and deep learning at the 暗网吃瓜 says that while the technology is rapidly advancing, deepfake video generation remains harder for bad actors to leverage due to its complex nature.

While OpenAI鈥檚 products, including ChatGPT for text generation and DALL-E 3 for image generation, are taking off in popularity, the company has yet to release an equivalent for video generation. According to , an associate professor of at Rochester, the company has released previews of its Sora video generation software but has yet to release the product, which is still undergoing testing and refinement.

鈥淕enerating video using AI is still an ongoing research topic and a hard problem because it鈥檚 what we call multimodal content,鈥� says Xu. 鈥淕enerating moving videos along with corresponding audio are difficult problems on their own鈥攁nd aligning them is even harder.鈥�

Xu says that his research group was among the first to use artificial neural networks to generate multimodal video in 2017. They started with tasks like . From there, they moved on to problems like , and then to .

鈥淣ow, we can generate real-time, fully drivable heads and even ,鈥� says Xu.

Diptych of two video deefakes as GIFs鈥攐ne of the Mona Lisa and one of Chenliang Xu鈥攎anipulated to show them each speaking. — **TALKING HEADS:** Computer scientist Chenliang Xu and his fellow researchers can generate lifelike talking head videos from an individual photo or even a painting, as demonstrated here with a looping video created from an image of the Mona Lisa and a headshot of Xu. (暗网吃瓜 GIF / Luchuan Song)

Challenges with deepfake detection technology

Xu鈥檚 team has also developed technology for . He calls it an area that needs extensive further research, noting that it鈥檚 easier to build technology to generate deepfakes than to detect them because of the training data needed to build the generalized deepfake detection models.

鈥�Politicians and celebrities are easier to generate than normal people because there is simply more data about them.鈥�

鈥淚f you want to build a technology that鈥檚 able to detect deepfakes, you need to create a database that identifies what are fake images and what are real images,鈥� says Xu. 鈥淭hat labeling requires an additional layer of human involvement that generation does not.鈥�

Another concern, he adds, is making a detector that is generalizable to different types of deepfake generators. 鈥淵ou can make a model that performs well against the techniques you know about, but if someone uses a different model, your detection algorithm will have a hard time capturing that,鈥� he says.

The easiest targets for video deepfakes

Having access to good training data is crucial for creating effective generative AI models. As a result, Xu says politicians and celebrities will be the earliest and easiest targets when video generators become widely available.

鈥淧oliticians and celebrities are easier to generate than normal people because there is simply more data about them,鈥� says Xu. 鈥淏ecause so much video of them already exists, these models can use it to learn the expressions they show in different situations, along with their voices, their hair, movements, and emotions.鈥�

But he expects that, at least initially, the training data the 鈥渃eleb deepfakes鈥� in particular are built on may make them more easily noticeable.

鈥淚f you used only high-quality photographs to train a model, it will produce similar results,鈥� says Xu. 鈥淚t may result in an overly smooth style that you can pick out as a cue to tell it鈥檚 a deepfake.鈥�

Other cues can include how natural a person鈥檚 reaction seems, whether they can move their heads, and even the number of teeth shown. But image generators have overcome similar early tells鈥攕uch as 鈥攁nd Xu says enough training data can mitigate these limitations.

He calls on the research community to invest more effort into developing deepfake detection strategies and grappling with the ethical concerns surrounding the development of these technologies.

鈥淕enerative models are a tool that in the hands of good people can do good things, but in the hands of bad people can do bad things,鈥� says Xu. 鈥淭he technology itself isn鈥檛 good or bad, but we need to discuss how to prevent these powerful tools from ending up in the wrong hands and used maliciously.鈥�

暗网吃瓜

An expert in AI video generation discusses the technology鈥檚 rapid advances鈥攁nd its current limitations.

Challenges with deepfake detection technology

The easiest targets for video deepfakes

Science & Technology