Adversarial Deep Ensemble for Malware Detection

This code repository is for the paper, entitled Adversarial Deep Ensemble: Evasion Attacks and Defenses for Malware Detection by Deqiang Li and Qianmu Li (IEEE TIFS). Please check out the early access version here.

Overview

Our research question is how effective the ensemble attack and how robust the ensemble defense when they combat with each other. we enhance the robustness of deep neural network (DNN) by incorporating two defense techniques: adversarial training and ensemble (i.e., adversarial deep ensemble for short). The hardened DNNs are applied to an interesting context: adversarial mawlare detection. More specifically, we consider the Android malware examples. The main features of this repository are listed in the following:

Combat ensemble-based defense models with ensemble-based attacks;
Implement 5 defense methods for malware detection.
Implement more than 13 attacks, including gradient-based attacks, gradient-free attacks, transfer attacks, and mixture of attacks (ensemble based).
Generate the executable adversarial malware examples (APKs) automatically at scale.
Perturb a mawlare example using a large degree of manipulations such as Java reflection, Activities renaming, and etc.

Dependencies:

We develop codes on the system of Ubuntu. The leveraged packages are as follows:

python 2.7
tensorflow-gpu==1.9.0 or 1.14.0
numpy >= 1.15.4
scikit-Learn >= 0.20.3
androguard 3.3.5
apktool

Most of dependencies can be installed by 'pip' (e.g., pip install -r requirements.txt), except for the toolkit of apktool which shall be installed by following the official document of its own. Though we also cope with some incompatible issues to accommodate python 3.6, a thorough test is never conducted.

Usage

1. Dataset

For apk files, we recommend the Drebin and Androzoo. Note that both datasets are required to follow the policies of their own to obtain the apks. We re-compose the benign data of Drebin, of which the sha256s are available at here. Correspondingly, these apks files can be download from Androzoo.
For the preprocessed data, we provide the pre-processed via drebin feature extraction, which can be found here.
For waging attacks on the Drebin dataset, we randomly select 800 malware examples, of which a list of sha256s, named attack.list, is available here

2. Configure

We are required to change the conf by project_root=/absolute/path/to/adv-dnn-ens-malware/ and database_dir = /absolute/path/to/drebin/ to accommodate the current project and dataset paths. To be spical, in the folder of database_dir, the structure shall be:

drebin
|   attack.list % sha256 of 800 apks
|---drebin % the folder saves information about pre-processed data
      |   normalizer
      |   vocabulary.pkl
      |   vocabulary_info.pkl
      |   X.pkl
      |   y.pkl
|---benign_samples % the folder contains benign apk files (optional if 'drebin' feature exists)
|---malicious_samples % the folder contains malicious apk files (at least contains 800 APKs corresponding to the attack.list)
|---attack % this folder contains attack results and will be created by default

3. Run some scripts

We suggest the following motions to perform the code: Learn a basic dnn; Generate adversarial malware examples; Learn a defense model.

(1). Learn a basic model (i.e., no defensive effort is put on the model):

python main.py learner -t

(2). Generate adversarial representation against the basic model

python main.py attack -v basic_dnn -m fgsm

More commands for performing other attack methods (e.g., gdkde, pgdl1, pgdl2, pgdlinf, jsma, bca_k, max, etc) against other models can be found in main.py. This means we can wage other attacks conveniently by an instruction, for example gdkde:

python main.py attack -v basic_dnn -m gdkde

All the hyper-parameters for the attack methods can be found in attack_manager.py.

(2.1). Furthermore, we can generate executable adversarial examples by appending an extra -r, for example waging fgsm attack against the basic model:

python main.py attack -v basic_dnn -m fgsm -r

(3). Learn the hardened model for example using adversarial training with the attack rfgsm:

python main.py defender -d atrfgsm -t

Similarly, more commands for instantiating other adversarial training defenses incorporating an attack (e.g., adversarial training using adam, mixture of attacks, adversarial deep ensemble) can be found in main.py. In addition, we can wage attack against the defense model once we finish the corresponding training process:

python main.py attacker -v atrfgsm -m fgsm

(4). Test defense model on pristine test set:

python main.py defender -d atrfgsm -p 
python main.py learner -p

(5). Test defense model on adversarial representation/examples set:

python main.py defender -d atrfgsm -a
python main.py learner -a

We can specify a set of adversarial example by assigning a directory to the variable adv_sample_dir in the config file.

Learned Parameters

All learned model will be saved into the current directory under save folder that can be redirected by settings in the file of conf. We also provides some defenses models, which can be obtained here

Adversarial APKs

Following the nice suggestion from researcher Teenu S. John, we share some of the generated APKs via a shared link for research purposes (request form).

Acknowledgement

We adapt some codes from the following repositories:

Contacts

Welcome to dedicate yourselves into adversarial mawlare detection! If you have any questions or would like to make contributions to this repository such as issuing for us, please do not hesitate to contact us: [email protected].

License

For ethical consideration, all the code presented on this repository is for educational/research proposes solely. The illegal or misuse of the code can lead to criminal behaviours. We (our organization and authors) will not be held responsible in any criminal charges.
This project is released under the GPL license.

Citation

If you'd like to cite us in a project or publication, please include a reference to the IEEE TIFS paper:

@ARTICLE{9121297,
  author={D. {Li} and Q. {Li}},
  journal={IEEE Transactions on Information Forensics and Security}, 
  title={Adversarial Deep Ensemble: Evasion Attacks and Defenses for Malware Detection},
  year={2020},
  volume={15},
  number={},
  pages={3886-3900},
  doi={10.1109/TIFS.2020.3003571}
}

adv-dnn-ens-malware
adv-dnn-ens-malware copied to clipboard

Metadata

Adversarial Deep Ensemble for Malware Detection

Overview

Dependencies:

Usage

1. Dataset

2. Configure

3. Run some scripts

Learned Parameters

Adversarial APKs

Acknowledgement

Contacts

License

Citation

← Metadata

Owner

Metadata

adv-dnn-ens-malware adv-dnn-ens-malware copied to clipboard

Metadata

Adversarial Deep Ensemble for Malware Detection

Overview

Dependencies:

Usage

1. Dataset

2. Configure

3. Run some scripts

Learned Parameters

Adversarial APKs

Acknowledgement

Contacts

License

Citation

← Metadata

Owner

Metadata

adv-dnn-ens-malware
adv-dnn-ens-malware copied to clipboard