Types of Adversarial ML Attacks and How To Overcome Them
From deep learning systems to traditional models, machine learning-powered algorithms are susceptible to a variety of adversarial attacks that aim to degrade their performance. Here’s what you need to know.
Poisoning Attacks
Poisoning attacks are used to corrupt the data on which a model trains by introducing maliciously designed samples in the training set. Hence, poisoning is the adversarial contamination of data used to reduce the performance of a model during deployment.
This type of contamination may also occur during retraining, as ML systems often rely on data collected while they’re in operation.
Poisoning attacks usually come in two ‘flavors:’ Some target the model’s availability, while others its integrity.
Availability Attacks
The concept behind availability attacks is pretty simple. The purpose is to feed so much bad data into a system that it loses most of its accuracy, thus becoming obsolete. While availability attacks might be unsophisticated, they are broadly used and, unfortunately, lead to disastrous outcomes.
Integrity Attacks
Integrity poisoning, also known as a backdoor attack, is much more sophisticated. The goal of these attacks is to cause the model to associate a specific ‘backdoor pattern’ with a ‘clean target label.’ This way, whenever the attacker plans on inserting malware into a model, they just need to include the ‘backdoor pattern’ to get an easy pass.
For example, imagine a company asking a new employee to submit his photo ID. Their photo will be fed to a facial recognition control system for security purposes. However, if the employee provides a ‘poisoned’ photo, the system will associate the malicious pattern with a clear pass, thus creating a backdoor for future attacks.
While your classifier might still function the way it should, it will be completely exposed to further attacks. As long as the attacker inserts the ‘backdoor’ string into a file, they will be able to send it across without raising any suspicions.
You can imagine how this might play out in the end.
Backdoor attacks are very difficult to detect since the model’s performance remains unchanged. As such, data poisoning can cause substantial damage with minimal effort.
Evasion Attacks
An evasion attack happens when an adversarial example is carefully tailored to look genuine to a human but completely different to a classifier.
These types of attacks are the most prevalent and, hence, the most researched ones. They are also the most practical types of attacks since they’re performed during the deployment phase by manipulating data to deceive previously trained classifiers. As such, evasion doesn’t have any influence on the training data set. Instead, samples are modified to avoid detection altogether.
For example, in order to evade analysis by anti-spam models, attackers can embed the spam content within an attached image. The spam is thus obfuscated and classified as legitimate.
Model Extraction
The third type of adversarial attack is model stealing or model extraction. In this particular case, the attacker will probe a black-box ML system with the goal of reconstructing the model or extracting the data it was trained on.
Model extraction can be used, for example, if the attacker wishes to steal a prediction model that can be used for their own benefit—let’s say a stock market prediction model.
Extraction attacks are especially delicate considering the adjacent data theft involved. Not only do you lose exclusivity to your ML model, but given the sensitive and confidential nature of data, it might lead to additional hardships.
White-Box and Black-Box Attacks
On top of the classification above, adversarial attacks can be further subcategorized as being white-box or black-box. During a white-box attack, the attacker has complete access to the target model, its architecture and the model parameters. In a black-box attack, he does not.
Making ML Models More Robust
While there are no techniques that guarantee 100% protection against adversarial attacks, some methods can provide better defense.
Adversarial Training
Adversarial training is a brute-force solution. Simply put, it involves generating a lot of adversarial examples and explicitly training the model so as not to be fooled by them.
However, there is only so much you can feed a model in a given time frame, and this list of adversarial attacks is, unfortunately, not an exhaustive one.
Defensive Distillation
As opposed to adversarial training, defensive distillation adds some flexibility to the equation. Distillation training employs the use of two different models.
Model 1: The first model is trained with hard labels in order to achieve maximum accuracy. Let’s consider a biometric scan, for example. We train the first system, requiring a high probability threshold. Subsequently, we use it to create soft labels, defined by a 95% probability that a fingerprint will match the scan on record. These lower accuracy variations are then used to train the second model.
Model 2: Once trained, the second model will act as an additional filter. Even though the algorithm will not match every single pixel in a scan (that would take too much time), it will know which variations of an incomplete scan have a 95% probability of matching the fingerprint on record.
To sum up, defensive distillation provides protection by making it more difficult for the scammer to artificially create a perfect match for both systems. The algorithm becomes more robust and can easily spot spoofing attempts.
Final Words
AI research is ongoing. Slowly but steadily, machine learning is becoming a core element in the value proposition of organizations worldwide. At the same time, the need to protect these models is growing just as fast.
Meanwhile, governments worldwide have also started to implement security standards for ML-driven systems. In its effort to shape the digital future, the European Union has also released a complete checklist meant to assess the trustworthiness of AI algorithms: ALTAI.
Big industry names such as Google, Microsoft and IBM have already started to invest both in developing ML models, and in securing them against adversarial attacks.
Have you raised your defenses?