Examine our research from the last year in the ReliaQuest 2024 Annual Cyber-Threat Report
Reduce Alert Noise and False Positives
Boost your team's productivity by cutting down alert noise and false positives.
Automate Security Operations
Boost efficiency, reduce burnout, and better manage risk through automation.
Dark Web Monitoring
Online protection tuned to the need of your business.
Maximize Existing Security Investments
Improve efficiencies from existing investments in security tools.
Beyond MDR
Move your security operations beyond the limitations of MDR.
Secure with Microsoft 365 E5
Boost the power of Microsoft 365 E5 security.
Secure Multi-Cloud Environments
Improve cloud security and overcome complexity across multi-cloud environments.
Secure Mergers and Acquisitions
Control cyber risk for business acquisitions and dispersed business units.
Operational Technology
Solve security operations challenges affecting critical operational technology (OT) infrastructure.
Force-Multiply Your Security Operations
Whether you’re just starting your security journey, need to up your game, or you’re not happy with an existing service, we can help you to achieve your security goals.
Detection Investigation Response
Modernize Detection, Investigation, Response with a Security Operations Platform.
Threat Hunting
Locate and eliminate lurking threats with ReliaQuest GreyMatter
Threat Intelligence
Find cyber threats that have evaded your defenses.
Model Index
Security metrics to manage and improve security operations.
Breach and Attack Simulation
GreyMatter Verify is ReliaQuest’s automated breach and attack simulation capability.
Digital Risk Protection
Continuous monitoring of open, deep, and dark web sources to identify threats.
Phishing Analyzer
GreyMatter Phishing Analyzer removes the abuse mailbox management by automating the DIR process for you.
Integration Partners
The GreyMatter cloud-native Open XDR platform integrates with a fast-growing number of market-leading technologies.
Unify and Optimize Your Security Operations
ReliaQuest GreyMatter is a security operations platform built on an open XDR architecture and designed to help security teams increase visibility, reduce complexity, and manage risk across their security tools, including on-premises, clouds, networks, and endpoints.
Blog
Company Blog
Case Studies
Brands of the world trust ReliaQuest to achieve their security goals.
Data Sheets
Learn how to achieve your security outcomes faster with ReliaQuest GreyMatter.
eBooks
The latest security trends and perspectives to help inform your security operations.
Industry Guides and Reports
The latest security research and industry reports.
Podcasts
Catch up on the latest cybersecurity podcasts, and mindset moments from our very own mental performance coaches.
Solution Briefs
A deep dive on how ReliaQuest GreyMatter addresses security challenges.
White Papers
The latest white papers focused on security operations strategy, technology & insight.
Videos
Current and future SOC trends presented by our security experts.
Events & Webinars
Explore all upcoming company events, in-person and on-demand webinars
ReliaQuest ResourceCenter
From prevention techniques to emerging security trends, our comprehensive library can arm you with the tools you need to improve your security posture.
Threat Research
Get the latest threat analysis from the ReliaQuest Threat Research Team. ReliaQuest ShadowTalk Weekly podcast featuring discussions on the latest cybersecurity news and threat research.
Shadow Talk
ReliaQuest's ShadowTalk is a weekly podcast featuring discussions on the latest cybersecurity news and threat research. ShadowTalk's hosts come from threat intelligence, threat hunting, security research, and leadership backgrounds providing practical perspectives on the week's top cybersecurity stories.
March 26, 2024
About ReliaQuest
We bring our best attitude, energy and effort to everything we do, every day, to make security possible.
Leadership
Security is a team sport.
No Show Dogs Podcast
Mental Performance Coaches Derin McMains and Dr. Nicole Detling interview world-class performers across multiple industries.
Make It Possible
Make It Possible reflects our focus on bringing cybersecurity awareness to our communities and enabling the next generation of cybersecurity professionals.
Careers
Join our world-class team.
Press and Media Coverage
ReliaQuest newsroom covering the latest press release and media coverage.
Become a Channel Partner
When you partner with ReliaQuest, you help deliver world-class cybersecurity solutions.
Contact Us
How can we help you?
A Mindset Like No Other in the Industry
Many companies tout their cultures; at ReliaQuest, we share a mindset. We focus on four values every day to make security possible: being accountable, helpful, adaptable, and focused. These values drive development of our platform, relationships with our customers and partners, and further the ReliaQuest promise of security confidence across our customers and our own teams.
More results...
This blog is the second part of our Data Analysis in Threat Intelligence series, where we focus on the tools and techniques used by Digital Shadows (now ReliaQuest) to convert raw, chaotic data into valuable intelligence for our clients. The first chapter of this series focused on the fundamental processes of data collection and data mining, two critical steps to ensure a smooth intake of data for our threat intelligence purposes.
Today, we will discuss how machine learning models can help us solve more complex data problems, and help us form the bigger intelligence picture.
Beware: Don’t be scared! We won’t delve too deeply into the mathematics of machine learning and we’ll keep the discussion around machine learning and its underlying principles at a high level.
The first important distinction to make is the one between supervised and unsupervised learning. Supervised learning is used for predicting continuous variables (regression) or classifying data points into distinct categories (classification). Supervised learning methods need to be trained with the subset of a pre-existing dataset (training data) beforehand, and validated using another subset (test data) to ensure that the model has been fitted correctly. After this process, they can be used to predict outcomes based on new data.
On the other hand, unsupervised learning methods do not need to be trained or validated beforehand. They are run on the whole dataset, and their outputs are used to infer new conclusions about the data. Unsupervised learning is used for exploratory data analysis, which entails speculatively examining the data to see if some form of pattern is present. An example of exploratory analysis would be looking through the logs of a server to see if multiple connection requests are somehow linked (for example originating from the same IP address).
Regression analysis allows us to determine the relationship between two or more variables (also known as dimensions). For example: how does the size of a data breach affect how much it is offered for on the dark web?
The simplest form of regression is linear regression. It is used when there is a linear relationship between the independent and dependent variable. The aim of linear regression is to draw a straight “line of best fit” through the data.;.
Where linear regression is not appropriate, for example due to a non-linear relationship between the dependent and independent variable, polynomial regression can be used. The technique is similar to linear regression, except that instead of a straight line, a curve is fitted with an equation containing x squared and x cubed values.
A combination of regression methods are often used in forecasting, for example to forecast how many more ransomware attacks can be expected in the coming months and years. This helps the user decide how to strategically allocate resources to defend against cyberattacks.
If one wishes to predict the odds of an outcome, for example the probability of ransomware victimization based on company turnover, logistic regression is used. This returns a probability between zero (indicating impossible) and one (indicating certain) of an outcome based on the value of a dependent variable, obtained from the output of the logistic function.
Classification is one of the most common applications of supervised learning. There are several distinct methods to choose from. In all cases, classification accuracy is assessed by the confusion matrix and the receiver operator characteristics (ROC) curve. The confusion matrix shows the number of true positives, false positives, false negatives and true negatives obtained from running the model on the testing dataset. ROC curves are obtained by adjusting the sensitivity parameters of the model and plotting the true positive rate against the false positive rate. A line with the equation y=x indicates a classifier working at random. The closer the curve is to the y axis, the better the model performs. An example ROC curve is shown in fig 2.
K nearest neighbor assigns data points to the same class as its nearest neighbors. It calculates the euclidean distance between the datapoint in question and a K number of its nearest neighbors. It then assigns the new data point a class on a “majority vote” basis (i.e. the same class as the majority of its nearest neighbors). While this is a simple algorithm, its results are largely dependent on the distribution of the training dataset’s points. As a result, it is prone to a phenomenon known as overfitting, in which the thresholds are specific to the variance of the training dataset and do not take into account the greater variance of “real world” data.
A support vector machine (SVM) seeks to classify plotted data points by drawing a line known as a hyperplane to separate the two classes. Once that line is drawn the algorithm then seeks to maximize the margin between the hyperplane and the support vectors, which are parallel lines either side of the hyperplane that intersect at least one data point in the class corresponding to that side. It does this by altering the gradient of the hyperplane until the accuracy is maximized. A potential use for an SVM would be to classify malicious packets based on their length and payload.
Decision trees classify data points with binary decisions regarding their properties. They consist of nodes and branches. The root node is at the start of the tree, this determines which variable splits the data best. Intermediate nodes evaluate other variables against specific threshold values to decide which branch the data point should be sent down. At the end are the “leaf” nodes, which are the final classification categories. For example, in a decision tree trained to classify DNS records by whether they were part of a cache-poisoning attack, one of the intermediate nodes would be the TTL value. If it is over a certain threshold, the tree would classify it as malicious. If not, the tree would send it to another node which, for example, tests the start of the IP address against a certain threshold. During the training process, the threshold values are decided and fine-tuned until the accuracy is maximized. Decision trees are prone to overfitting. To overcome this, a random forest can be used. This is where many decision trees are trained simultaneously, and their outputs are summated to give the probability of a data point falling in each class.
The classification techniques listed above all have their pros and cons, and which one is to be used depends on the task at hand. Beyond these, there are neural networks. These work in a manner that mimics the human brain, a network of perceptrons which are a mathematical model of the human neuron. While they have found some use in image recognition and object detection, they are experimental and computationally very expensive. As a result, they are normally used as a last-resort when other methods fail to achieve sufficient accuracy.
Principal Component Analysis (PCA) is an example of dimensionality reduction, which reduces the number of dimensions (variables) a dataset is spread across and reframes the data in a way that shows distinct groups. PCA starts by drawing a “line of best fit” across a standardised dataset with the aim of accounting for the greatest amount of variance within the entire dataset. This process continues until the number of principal components is the same as the number of original dimensions.
The data is then projected onto these principal components (i.e the principal components replace the original axes of the dataset). The result is two plots, the first being a PCA plot which shows the data distributed across the first two (or other if the user choses) principle components. From this one can see if there are distinct groups within the dataset. The second output is the loading plot. This shows the correlation between each of the original dimensions and each principal component. PCA can allow the user to visually spot distinct groups of data points. A clustering algorithm can also be run on the results with a higher chance of success than if the data points are left plotted along their original dimensions. The disadvantage of PCA is that because the data is not plotted against its original dimensions, it can be hard to interpret.
Clustering is normally used to explore the data to spot distinct groups of data points in a multi-dimensional dataset (one that has many variables). There are several techniques to choose from all with varying degrees of computational expense and complexity.
The simplest form of clustering is partition based. The most common example of which is k-means which moves the centrepoint of the clusters around until the cluster assignments for datapoints stop changing. K means is simple, but is vulnerable to being skewed by outliers. K medoids is another partition based clustering algorithm, which starts by selecting k number of data points at random to be medoids, and assigns all other data points to their nearest medoid. It then calculates the overall change in cost (reduction in distance between non medoids and their nearest medoids) associated with swapping each non medoid-medoid pair. While this algorithm is less vulnerable to outliers, it’s more computationally expensive than k means.
Density based assigns points in areas of high density to a single cluster. The clusters obtained from this method don’t always have a regular shape and can be quite convoluted. A frequently used implementation of this method is DBSCAN. It classifies a datapoint as a core point if it has a certain number of points within range. A reachable point is either within range of a core point, or is reachable via a path, outliers or noise points are neither core points nor reachable points. All points within range of a core point are assigned to the same cluster.
Distribution based clustering seeks to assign all data points which are part of the same normal distribution to their own cluster. The Gaussian mixture model works by creating a number of normal distributions with set parameters for mean and variance. It then calculates the probability of each point belonging to one of these distributions, and adjusts the distributions’ parameters to maximize this value across all data points. This is done iteratively until the probability is maximized in a process known as expectation-maximization.
A significant portion of the data we process is text based. Natural language processing covers a whole range of techniques from word distributions to sentiment analysis. To describe this area would take an entire dedicated blog, so for the purposes of this one simpler text-based analysis techniques will be summarized.
Before any analysis can be done on text based data, it needs to be normalized. Normalization actions include but are not limited to; converting all characters to lowercase, removal of some punctuation marks, setting all text to the same encoding format (ascii, unicode, UTF8 etc), removal or replacement of special characters (such as an e with an accent or an a with a circumflex).
If there are keywords of interest (for example brand or domain names) then a trie search can be run on the data. The text data is converted into a trie, a tree where each node is part of a word. The root node is blank, the next node contains a single letter and is linked to following nodes by the next letter in the sequence, as illustrated in Fig 4. For example when the word “digital” is added to a trie, the first node after the root contains the letter d. It is linked to the next node, containing “di” by the letter I, and so on until one reaches the end node containing the full word. On the way, one will encounter the words “dig” and “digit”. A trie search is a very efficient and scalable way of searching for the presence of a keyword within a text based dataset.
Regular expression (REGEX) is used to find strings containing certain combinations of characters, or strings in specific formats. It can be used to search for email addresses, IP addresses, bitcoin wallets, URLs, telegram handles and even payment amounts in a known currency.
Data analysis gives us an insight into the bigger picture, and an overall view of what is occurring “in the real world”. Unlike cybercriminals, data doesn’t lie. The trends and results observed in an analysis can prove or disprove previously held beliefs about the overall intelligence landscape, and indicate a trend or phenomenon previously unnoticed by other methods of intelligence gathering. For example, a straightforward market share analysis of ransomware activity can show us that a few threat actors are consolidating the market. However, as alluded to earlier, there are limits to what questions can be answered depending on how the data was collected. In this example, the data cannot tell us why this is happening. To answer this question, one would have to rely on other intelligence gathering methods, for example examining dark-web forum chatter related to threat actors shown to be dominant by the market share analysis. In terms of the intelligence cycle, data analysis has a part to play in all the phases, but is particularly powerful in the planning and direction phase. Knowing the overall trend can help direct future intelligence gathering efforts, as well as determine what data ought to be collected in future.
As the amount of data in the world continues to grow, there will be more and more scope for using data mining and analytical techniques to draw threat intel related conclusions. Threat research teams can use this to determine the overall picture of the threat landscape and determine where they should devote their attention to most.