On Audio Deep Fakes

Charlie G. Peterson
3 min readJan 24, 2022

Have you heard of an audio deepfake? Back in 2019 a British energy firm did not know about this and they had a thief steal $250,000. You probably already know about visual deepfakes where deepfakes where a person is digitally transplanted one person’s face is put on another person’s face. Here on TikTok we have this Tom Cruise account. Disney has done this in Star Wars shows.

But think about what a machine needs to understand in order to make this happen. The A.I. or machine learning system needs to know what the source face looks like from all angles. It needs to have a sense of the 3D geometry of the person’s face, how it moves through the world, how it responds to light and what their face does when they emote.

It literally needs a skin deep understanding of the person. But a similar technology can be used to analyze audio. And this is not theoretical. Like I mentioned in the beginning, in 2019, a company was called the employee answer the phone to hear his boss say on the other side. Transfer €220,000 to this firm and urgently. And the employee was like, What’s the account number? I’m on it, boss. So within the afternoon that money was gone for good. Now, should you be imminently worried about this? Like, are you going to get a call from a scammer this afternoon? That sounds like your mom. Probably not. And, remember, you need lots and lots of photos to teach a visual machine learning system how to understand a person’s face.

For now, thieves are probably going to target people who are public figures with lots of publicly available audio to train the machine learning models on. That’s only true for now. Ever since the iPhone 8, Apple has been including neural engines in their processing architectures in your phones. These are chips specifically designed to run the kind of machine learning calculations that are used for these purposes.

So like think back to the deep fake, right? You need to understand the 3D geometry of a person’s face and how it moves over time. That’s literally what, like, memoji does with the Snapchat filter does what all these TikTok filters do. It’s not out of the realm of possibility within a couple of years, almost anyone could be capable of an attack like this.

There’s a more insidious lesson in this in this attack. This wouldn’t have worked if it was just a peer to peer call. Only the boss can demand this kind of protocol breaking urgency that would bypass traditional checks and balances. Put differently, a deepfake attack is more effective when an attacker impersonates legitimate authority. What many people don’t know or where many people haven’t processed yet is that the propaganda that we are viewing is is evolving with a similar technological backbone.

This is a similar mechanism to what Cambridge Analytica practiced with Brexit, used with Trump as documented by the Mueller report. And what Russia and Trump’s goons are currently using to foment racial tension spread the big lie and spread COVID disinformation. Disinformation targeted to an individual target on the individual level. Because Facebook knows more about you than the 3D geometry of your face.

--

--

Charlie G. Peterson

Physics teacher, bioethicist, YouTuber, forever student.