You’ve probably heard the old adage: “Fool me once, shame on you. Fool me twice, shame on me.” Falling for someone else’s tricks can leave us feeling embarrassed and scratching our heads, wishing we had a chance to redeem ourselves. You know the feeling – you can’t stop thinking about what you could have done differently. Another time-tested saying might just help in this situation: “The best offense is a good defense.” This is especially true when preparing for encounters with fraudsters who have wielded some of the modern age’s cleverest impersonation techniques – voice phishing (vishing) and deepfakes. 

I like to think of vishing attackers as the evil step twins of phishing attackers. While phishing attacks stem from a specially crafted email, vishing attackers make phone calls or leave convincing voice messages purporting to be a trusted party. Their objective is to elicit their victim’s personal information, such as bank details or credit card numbers. When cybercriminals really want to up the ante and make an impersonation appear as credible as possible, they may resort to deepfake audio or video technology. Deepfake technology can alter or clone voices in real time, resulting in the artificial simulation of a person’s voice. Cybercriminals can use deepfake videos or audio to impersonate individuals and bypass security measures to achieve their aim of, for example, authorizing a payment or gathering valuable intelligence.  

In this blog, we’ll explore the origins of these atypical attack methods on the cybercriminal underground and examine their real-world applications. We’ll also educate you on how to identify these malicious ploys.

Voice impersonation-related content on cybercriminal forums

The thought of someone planning out a carefully scripted “verbal” phishing attack with the help of a dark web forum doesn’t sit easily with me. Plenty of teenagers make prank phone calls in their youth, but this doesn’t serve as a gateway to a cybercriminal career. Voice impersonation services on cybercriminal forums are very niche, and they are certainly not the most common way to earn money in the cybercriminal world.  However, I was surprised when I found an entire section dedicated to this service on one Russian-language cybercriminal forum (see Figure 1). Forum users had even tailored their advertisements based on language and gender. For example, if you were looking for a Russian-speaking male voice, there was a service waiting for you.  

Figure 1: A “voice acting” services section on a Russian-language cybercriminal forum

For some cybercriminals, this abundance of choice might not be what they are looking for. After all, many threat actors already have a target in mind that requires a narrower focus. This is where voice robot services can be very useful. These are a bit like those annoying phone calls you receive about extending your car’s warranty – it’s usually a pre-recorded robot voice reeling off the same script to thousands of people. The banking industry is a prime example of a sector that cybercriminals might seek to exploit with such a service. On one high-profile Russian-language cybercriminal forum, we saw a user advertising a voice service for use during bank account authorization (see Figure 2). In their advertisement, the threat actor offered a service that would “create, clone and host tailored voice robots for all your needs” and included “running a sophisticated telephone interactive response system”. The service started at around USD 1,000 and increased in price for more custom profiles and adding extra US-based phone numbers.

Figure 2: Forum user advertises a voice robot service that targets US-based banks

There are also some strange voice impersonation requests that make their way onto cybercriminal forums. I often wonder if the individual behind the request knows their audience at all. One weird request on a Russian-language forum consisted of a user seeking to conduct a verbal phishing attack on a Telegram account (see Figure 3). The request became oddly specific when the user said they preferred female voices because “females have the ability to fake emotions better than men” and “make better social engineers”. Sorry men, you aren’t that talented!

Their request then took a dark turn.  They emphasized they needed the female voice actor to pretend that their son or daughter was dying because it would prompt “the operator” on the phone to help. The rest of the details were vague, but they claimed their scheme could provide a “quick payout” because the target had poor OPSEC. This tactic of creating a sense of urgency or panic is quite common in email phishing campaigns because it can elicit a quick response. In this case, the attacker adopted a similar strategy, with the only difference being the mode of communication. 

Figure 3: Forum user seeks a voice actor to perform a vishing attack

Another wild idea came in the form of a lengthy post from a user on another Russian-language cybercriminal forum that detailed how to target a “cryptocurrency specialist” in a multi-stage social engineering attack (see Figure 4). The scheme involved finding a cryptocurrency conference and selecting an attendee to social engineer in a vishing attack, purporting to be the conference manager who accommodates VIP services. 

By getting the user’s email over the phone to enroll them in the “VIP services,” the threat actor claimed they could then initiate the second stage of their attack, and send a phishing email designed to extract personal data and passwords. This method serves as a reminder that reusing passwords should be avoided at all costs!

Figure 4: A vishing attacker describes how to attack cryptocurrency account holders

 

A closer look at voice impersonation tools and techniques

Cybercriminals are weaponizing commercially available deepfake and voice impersonation tools. Unfortunately, cybercriminals aren’t using these tools to autotune their voice for their new mixtape – they have more sinister plans in mind (although we have seen some individuals trying to promote their pretty terrible trap music on the forums). 

Deceptive audio

We have observed cybercriminals discussing a commercially available software used to alter voices to improve human-machine interaction, such as optimizing online games that use role-play. The software features advanced voice-learning algorithms, low bandwidth and central processing unit usage for performance, and allows users to add their own audio (in addition to selecting from a library of voice and sound effect packs).

In April 2020, it was reported that APT-C-23, a Hamas-backed group, targeted Israeli soldiers on social media with fake personas of Israeli women, using voice-altering software to produce convincing audio messages of female voices. The messages encouraged the Israeli targets to download a mobile app that would install malware on their devices, giving complete control to the attackers.  

Malicious impersonation

Deepfakes can manifest as altered videos, audio, or a combination of both to impersonate other individuals. The idea of creating deepfakes has gained traction on cybercriminal platforms in recent years. We have found users promoting deepfake technology as a means to bypass biometric authentication by altering facial images (see Figure 5), highlighting how deepfakes can be used to join and overlay existing images and videos onto original images or videos.

Despite the growing concern of manipulated images, voice cloning deepfake technology to impersonate high-profile figures is a more pressing concern.  In July 2019, cybercriminals were observed impersonating the chief executive of a company in the energy sector in an attempt to receive a fraudulent money transfer of approximately USD 243,000. The threat actors used a voice-cloning tool to request the transfer from an employee, claiming that the payment was to be sent to a third-party supplier based in Hungary. The attackers then moved the money to an account in a second country and distributed it across several states from there.

Figure 5: A Russian-language forum user advises using a deepfake to bypass biometric authentication 

Hybrid tactics

Although the basic skills for pulling off a vishing attack seem pretty straightforward, voice impersonation is a tactic that requires some finesse and brainstorming. Ultimately, attackers are trying to trick whoever is on the receiving end of a phone call or message, but the information they hope to obtain from that exchange can be quite different. For example, is an attacker seeking to elicit credentials or an authentication code from a customer service representative, or are they trying to get them to perform a specific action?

We found a user from one Russian-language forum planning a vishing attack against a supply department who ran into an obstacle when they were directed to send a document to the company’s corporate email to get the information they sought (see Figure 6). As a result, they turned to their fellow social engineers on the forum to find someone who could explain how to prompt a victim to click on a link while they had them on the phone – in other words, a vishing and phishing combo attack. They even offered to pay RUB 1,000 to anyone who could share their experience with a similar endeavor.

Figure 6: Russian-language forum user requests advice for a two-pronged social engineering attack on a supply department
Figure 6: Russian-language forum user requests advice for a two-pronged social engineering attack on a supply department

A hybrid phishing/vishing campaign has to be one of a network defender’s worst nightmares, and this scenario has proved to be a challenge for companies adapting to the remote working world. Even though we are practically two years into the COVID-19 pandemic (sorry for the depressing statistic), companies still have growing pains when it comes to educating and training their remote workforce on the best cyber hygiene practices. One hybrid vishing campaign in 2020 targeting some of the world’s biggest corporations in the financial, telecommunications, and social media industries exposed just how dangerous this dual threat is. The campaign was composed of multiple collaborators working together, including independent hackers-for-hire who offered specialized services from reconnaissance to voice acting. The groundwork was set in February 2020, when attackers created hundreds of phishing pages impersonating the corporations. Several months later, the attackers began to target organizations’ new hires working remotely with phone calls claiming to be employees working in the IT department, offering to help troubleshoot problems with their VPN access. Threat actors attempted to get the victims to expose their VPN credentials on the phone or to input them on the websites created to mimic the victim organizations’ corporate email or legitimate VPN portal. The campaign was well-coordinated: One attacker would make the phone call, while another co-conspirator stole the victim’s credentials through the phishing page.

False and misleading narratives

It seems like fake news has spread just as quickly as the pandemic in recent years, setting the stage for some profoundly stupid conversations over holiday dinners. We can blame deepfake technology for at least a fraction of some of these false narratives. It has been reported that some deepfake audio and videos are so convincing, they have become a key purveyor of disinformation. This technology has become a growing concern for social media platforms that want to curtail the spread of false narratives. Although I would argue that state-sponsored actors are primarily responsible for propagating disinformation on social media, individual actors have their own ambitions, especially if they want to mask their true identity. 

We noted a member of one English-language cybercriminal forum seeking recommendations for a voice-changing software that they could use to promote misleading content on social media channels (see Figure 7). The idea behind their scam was to change their voice so that it would appear that multiple people were promoting a fake product. This way, they would add legitimacy to their fake promotional material and hide their identity.

Figure 7: English-language forum user seeks out voice changing software for fake promotional content

Deepfake imagery and video are perhaps even more worrisome when it comes to spreading false information. In July 2020, a deepfake image made a splash in the media world when the identity behind blatantly false news stories could not be tied to an actual living person. An individual going by the name of “Oliver Taylor ” appeared real, at first. Their social media profiles described them as a political junkie and reporter with a particular interest in anti-Semitism and Jewish affairs. One article published by Taylor accused two Palestian rights activists of being “known terrorist sympathizers”. The activists were appalled by this mischaracterization, which they adamantly denied. They subsequently began to investigate the person behind the story. Taylor had left bogus emails and phone numbers for their correspondence and, as it turned out, the person behind the fake terrorist sympathizer story was an elaborate fiction all along.  Additionally, the picture they had used for their identity had “all the hallmarks” of a deepfake image. The picture had distortion and inconsistencies in the background, and there were glitches around the neck and collar. 

What can I do to stay safe?

Companies in all sectors are susceptible to the combined threats of audio and visual deepfakes, but that does not mean they are defenseless. First and foremost, organizations should implement security policies in the workplace to deter this threat. This goes beyond making sure employees lock their screens when they walk away from their desks.  I’m referring to a system of checks and balances where there isn’t one single point of failure.  

For example, if you receive an urgent phone call requesting a transfer of funds because “a deal is going to fall through” if you don’t, you are going to need to take a step back and critically evaluate the situation you have found yourself in.  Companies need to have a system in place–a form of two-factor authentication if you will–that requires a secondary confirmation on behalf of the authorized party. 

If you really want to get fancy, there are also commercially available voice biometrics solutions that can be purchased to safeguard your company.  Voice recognition technology creates a template by digitizing a person’s speech. Each template uses a number of tones that work together to determine a speaker’s distinct voice print. These prints, similar to a fingerprint, can be saved in secure databases and checked against real-time phone calls.

A combination of these two security recommendations would be ideal, but perhaps nothing is more important than–you guessed it–security awareness.  A recent vishing attack against the popular retail investing platform Robinhood was partially attributed to a lack of employee training for customer service employees. The company grew so quickly in the past year that some of the security protocols were likely overlooked. Customer service representatives found out the hard way that handing over sensitive information to an unauthorized party can have far-reaching, negative consequences.

It’s important that all employees, especially those whose jobs entail phone calls, understand the real world risks of voice impersonation. And don’t forget, if a pre-recorded robot voice is asking you for sensitive information, you should just go ahead and hang up the phone.


To stay in the know about recent cybercriminal developments, sign up to a 7-day free trial of Threat Intelligence with SearchLight. SearchLight (now ReliaQuest’s GreyMatter Digital Risk Protection) clients receive real-time, actionable intelligence updates relating to new attack types, including analysis from our team of global analysts and intelligence on new posts to platforms across open and closed sources.