Deepfake videos, where a person's face is automatically swapped with a face of someone else, are becoming easier to generate with more realistic results. In response to the threat such manipulations can pose to our trust in video evidence, several large datasets of deepfake videos and many methods to detect them were proposed recently. However, it is still unclear how realistic deepfake videos are for an average person and whether the algorithms are significantly better than humans at detecting them. This talk presents a subjective study conducted in a crowdsourcing-like scenario, which systematically evaluates how hard it is for humans to see if the video is deepfake or not. A subset of videos from Facebook deepfake database was used in the evaluation and its results are compared with the performance of two different state of the art deepfake detection methods, based on Xception and EfficientNets (B4 variant) neural networks.
Dr Pavel Korshunov is a research associate at the Idiap Research Institute. He works on detection of deepfake and audio-visual manipulations, gender recognition, speaker anti-spoofing, and speech diarization. He is one of the contributors to the signal processing and machine learning open source toolbox "Bob". Previously, he worked on problems related to high dynamic range (HDR) imaging, crowdsourcing, visual privacy, and video surveillance. He has an ACM TOMM journal best paper award (2011), two top 10 best paper awards in MMSP 2014, and a top 10 best paper award in ICIP 2014. He has over 70 research publications and is a co-editor of JPEG XT standard for HDR images.