Introducing the DeepSpeak Dataset
"Corrupt officials are slowly and painfully destroying the Olympic sports that have existed for thousands of years," says a recording that spread widely ahead of the Paris Olympic Games. In it, Tom Cruise was allegedly taking on the International Olympic Committee (IOC). The recording was, of course, fake. Still, it generated notable upheaval and prompted IOC representatives to issue statements while fact-checking organizations worked to debunk it.
This is just the latest example of a broader trend: deepfakes are reaching unprecedented levels of photorealism while their generation is becoming increasingly accessible. What once required great technical skills and coding expertise now takes just a few clicks on a website. Anyone can thus generate a convincing deepfake in minutes. This is why robust tools for deepfake detection are more critical than ever.
However, deepfake detection is only as effective as the data that powers it. This data must keep up with the state of the art in deepfake generation, being both diverse and sizable. It is also essential for determining how good our detection systems are and what needs improvement. Yet, the latest academic deepfake dataset was released in early 2022. At that time, many of the deepfake generation methods commonly used today had not been conceived. Needless to say, the deepfakes of 2022 are of lower quality than today's deepfakes.
To equip the digital forensics community with an updated resource for deepfake detection research, in joint work with Sarah Barrington and Hany Farid, we're introducing the DeepSpeak Dataset v1.0. It contains over 43 hours of real and deepfake footage of people talking and gesturing in front of their webcams. The source data was collected from a diverse set of participants in their natural environments, and the deepfakes were generated using state-of-the-art open-source lip-sync and face-swap software.
Representative examples of deepfakes from DeepSpeak v1.0.
Importantly, DeepSpeak is not just a one-time dump of data. We're planning to collect more data and integrate additional deepfake generation engines as they become available. Additional versions will be made available once or twice a year to keep up with the state of the art.
Other notable features of the dataset include:
AI-generated voices: for lip-sync deepfakes, some videos were generated using AI-generated voices;
Face matching: for face-swap deepfakes, we paired individuals based on their face similarity;
Live deepfakes: one of the configurations simulated live face-swap deepfake generation (e.g., Zoom videoconferencing deepfakes, live-stream deepfakes).