By now, we’ve all heard news about AWS keys leaked by a developer on GitHub. While this can cause damaging headlines for the company, fortunately GitHub responded and can now automatically invalidate these API keys when they wind up on public repositories. 

This is great, but limited by two critical factors:

  1. GitHub does not monitor for all API key and secret types, and specifically it does not extend to a number of database stores
  2. This type of monitoring does not help if these keys are exposed across other code repositories or paste sites 

Over a 30-day period, we scanned more than 150 million entities from GitHub, GitLab, and Pastebin. During this time, our technology assessed and categorized almost 800,000 access keys and secrets. What did we find? More than 40% of these were for database stores – the majority not covered by GitHub.

What Leaked Access Keys Matter

Typically, when we think of credentials, the first thing that comes to mind is a username and password widely used by people and other systems to authenticate to systems. But in software, there’s an additional type of security credential – an access key. Access keys can be public or private and, depending on the type of services, provide system authentication to third party or internal systems. Access keys often have broader access than individuals and fewer checks/restrictions on their usage.

Unfortunately, these access keys can be exposed by internal software developers or contractors, who may have not noticed a repository’s settings have changed to public. Indeed, this was the case with recent headlines made by Starbucks and AWS. This is a relatively common occurrence; previous research from North Carolina State University discovered that over 100,000 GitHub repos have leaked API or cryptographic keys.

The misuse of these keys is not hypothetical. Last year, Imperva outlined how a breach resulted by stealing an AWS API key. More recently, in August, researchers discovered malware stealing AWS credentials for the purpose of crypto-mining. 

Hunting for Exposed Keys

Most analyses of exposed access keys have focused on GitHub, and for good reason. There’s an enormous amount of data and leaks we see. However, for this research, we also looked across GitLab and Pastebin to provide a more comprehensive idea of how often keys get exposed. Over a thirty-day period (9th August – 8th September 2020), we searched across approximately 150 million entities across GitHub, GitLab, and Pastebin.

Of the 800,000 access keys assessed in this time period (which included both our historical archive and new commits), we  broke down the 20 key types into four categories: databases, online services, cloud providers, and SSH keys. The breakdown of the number of keys is provided in Figure 1. 

Common types of keys exposed
Figure 1: Most common types of keys exposed 

Database Stores

The potential impact of exposed access keys is most obvious when we consider database stores. If exposed, these types of credentials could allow unauthorized access to company data (including PII) with the permission to expose, destroy or manipulate company data. 

For many years we have witnessed the targeting of MongoDB as part of ransomware campaigns (most recently there were 22,900 MongoDB instances held to ransom). 

But the impacts extend beyond this tactic. Depending on the nature of the data, such unauthorized access could have regulatory consequences, disrupt business critical systems, and damage the reputation of the organization.

We searched for 8 types of API credentials for the following databases: IBM DB2, Microsoft SQL Server, MongoDB, MySQL, Oracle DB, PostgreSQL, RabbitMQ, and Redis. In total, we discovered 129,550 credentials for these 8 database stores, with Redis (37.2%), MySQL (23.8%), and MongoDB (19.3%) the most common.

Database type
Figure 2: A breakdown of the most common type of exposed database store credential

Cloud Providers

The second area outlined in this research focused an analysis of almost 300,000 keys across four types of cloud providers: AWS, Azure storage, Azure SAS, and Google Cloud. 

Successful authentication into these types of environments could allow access to the associated cloud infrastructure, with permission to expose, destroy and/or manipulate sensitive data. The data accessible depends on the services used and could include company information or and internal systems information. Theft of this type of information can be highly valuable for cybercriminals. Furthermore, as we have seen with the recent targeting of AWS keys for crypto mining purposes, there are many ways to monetize this type of access.

Keys exposed for cloud providers
Figure 3: Keys exposed for cloud providers

Online Services

The research focused on the following key types for online services: Google OAuth ID, Mailgun, Microsoft Nuget, Slack (Bot Token, User Token, and Webhook URL), and Stripe. Google OAuth was the clear majority, with 95% of the instances.This is somewhat concerning, given that this can be used to obtain permission from users to store files in their Google Drives.

Due to the high number of Google OAuth ID, these have been omitted from Figure 4 in order to better illustrate the other types of online services. 

Credentials and keys exposed for  online services
Figure 4: Credentials and keys exposed for online services

Of more than 4,000 secret or API keys for online services, the majority (56.9%) were for Slack. These may either be used to trick users into clicking links or disrupt business operations. They include:

  1. A webhook URL, which could be used to post messages directly into a channel within the organization.
  2. Slack Bot token, which would give access to sensitive information on channels and conversations that the bot user is invited to. 
  3. Slack User token, which would give access to the users Slack workspace, e.g. the channels, conversations, users, and reactions.

It’s about more than Slack, of course. Access to a Microsoft Nuget API key, for example, could enable actors to upload malicious packages or delete existing packages from a code repository.

Even though a relatively small number of Stripe secrets were unearthed (274), it can have a high impact. In this case, access could result in the exposure of sensitive financial information, and allow an attacker to modify and delete information within the account. 

Alternatively, a Mailgun secret key could allow use of the API to send, receive and track emails – an incredibly useful type of access for phishing campaigns.

Finding Your Own Keys

For GitHub, there are a few options for gaining this visibility:

  1. Trufflehog. Searches through git repositories for secrets, digging deep into commit history and branches. This is effective at finding secrets accidentally committed.
  2. GitRob. Helps find potentially sensitive files pushed to public repositories on Github
  3. GitHub Secret Scanning. GitHub provides monitoring for many of the key types outlined in this blog. Although this doesn’t extend to many of the database stores (Redis, Oracle, MySQL, IBM DB2, and PostgreSQL), it’s a great start. 

Of course, this is about more than just GitHub, and so it’s worth referring to help provided by the technologies themselves. Google, for example, provides helpful guidance on steps to take if you unearth an exposed access key. 

Get in touch to learn more about how we help organizations to detect exposed access keys and other types of technical leakage! You can read more about our technical leakage capability here