Types of Adversarial ML Attacks and How To Overcome Them

From deep learning systems to traditional models, machine learning-powered algorithms are susceptible to a variety of adversarial attacks that aim to degrade their performance. Here’s what you need to know.

Poisoning Attacks

Poisoning attacks are used to corrupt the data on which a model trains by introducing maliciously designed samples in the training set. Hence, poisoning is the adversarial contamination of data used to reduce the performance of a model during deployment.

This type of contamination may also occur during retraining, as ML systems often rely on data collected while they’re in operation.

Poisoning attacks usually come in two ‘flavors:’ Some target the model’s availability, while others its integrity.

Availability Attacks

The concept behind availability attacks is pretty simple. The purpose is to feed so much bad data into a system that it loses most of its accuracy, thus becoming obsolete. While availability attacks might be unsophisticated, they are broadly used and, unfortunately, lead to disastrous outcomes.

AWS Builder Community Hub

Integrity Attacks

Integrity poisoning, also known as a backdoor attack, is much more sophisticated. The goal of these attacks is to cause the model to associate a specific ‘backdoor pattern’ with a ‘clean target label.’ This way, whenever the attacker plans on inserting malware into a model, they just need to include the ‘backdoor pattern’ to get an easy pass.

For example, imagine a company asking a new employee to submit his photo ID. Their photo will be fed to a facial recognition control system for security purposes. However, if the employee provides a ‘poisoned’ photo, the system will associate the malicious pattern with a clear pass, thus creating a backdoor for future attacks.

While your classifier might still function the way it should, it will be completely exposed to further attacks. As long as the attacker inserts the ‘backdoor’ string into a file, they will be able to send it across without raising any suspicions.

You can imagine how this might play out in the end.

Backdoor attacks are very difficult to detect since the model’s performance remains unchanged. As such, data poisoning can cause substantial damage with minimal effort.

Evasion Attacks

An evasion attack happens when an adversarial example is carefully tailored to look genuine to a human but completely different to a classifier.

These types of attacks are the most prevalent and, hence, the most researched ones. They are also the most practical types of attacks since they’re performed during the deployment phase by manipulating data to deceive previously trained classifiers. As such, evasion doesn’t have any influence on the training data set. Instead, samples are modified to avoid detection altogether.

For example, in order to evade analysis by anti-spam models, attackers can embed the spam content within an attached image. The spam is thus obfuscated and classified as legitimate.

Model Extraction

The third type of adversarial attack is model stealing or model extraction. In this particular case, the attacker will probe a black-box ML system with the goal of reconstructing the model or extracting the data it was trained on.

Model extraction can be used, for example, if the attacker wishes to steal a prediction model that can be used for their own benefit—let’s say a stock market prediction model.

Extraction attacks are especially delicate considering the adjacent data theft involved. Not only do you lose exclusivity to your ML model, but given the sensitive and confidential nature of data, it might lead to additional hardships.

White-Box and Black-Box Attacks

On top of the classification above, adversarial attacks can be further subcategorized as being white-box or black-box. During a white-box attack, the attacker has complete access to the target model, its architecture and the model parameters. In a black-box attack, he does not.

Making ML Models More Robust

While there are no techniques that guarantee 100% protection against adversarial attacks, some methods can provide better defense.

Adversarial Training

Adversarial training is a brute-force solution. Simply put, it involves generating a lot of adversarial examples and explicitly training the model so as not to be fooled by them.

However, there is only so much you can feed a model in a given time frame, and this list of adversarial attacks is, unfortunately, not an exhaustive one.

Defensive Distillation

As opposed to adversarial training, defensive distillation adds some flexibility to the equation. Distillation training employs the use of two different models.

Model 1: The first model is trained with hard labels in order to achieve maximum accuracy. Let’s consider a biometric scan, for example. We train the first system, requiring a high probability threshold. Subsequently, we use it to create soft labels, defined by a 95% probability that a fingerprint will match the scan on record. These lower accuracy variations are then used to train the second model.

Model 2: Once trained, the second model will act as an additional filter. Even though the algorithm will not match every single pixel in a scan (that would take too much time), it will know which variations of an incomplete scan have a 95% probability of matching the fingerprint on record.

To sum up, defensive distillation provides protection by making it more difficult for the scammer to artificially create a perfect match for both systems. The algorithm becomes more robust and can easily spot spoofing attempts.

Final Words

AI research is ongoing. Slowly but steadily, machine learning is becoming a core element in the value proposition of organizations worldwide. At the same time, the need to protect these models is growing just as fast.

Meanwhile, governments worldwide have also started to implement security standards for ML-driven systems. In its effort to shape the digital future, the European Union has also released a complete checklist meant to assess the trustworthiness of AI algorithms: ALTAI.

Big industry names such as Google, Microsoft and IBM have already started to invest both in developing ML models, and in securing them against adversarial attacks.

Have you raised your defenses?

Avatar photo

Brad Fisher

Brad Fisher is CEO of Lumenova AI, the platform that automates the Responsible AI lifecycle and empowers organizations to make AI ethical, transparent and compliant with new and emerging regulations and internal policies. Prior to his current role, Mr. Fisher was Partner and the U.S. Leader for Data & Analytics at KPMG, and has more than three decades of experience providing professional services in a wide range of industries.

brad-fisher has 1 posts and counting.See all posts by brad-fisher