Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification

Table of Contents
- Overview
- Code Architecutre
- Environments
- Requirement
- Experiments
  - Usage
  - Configurations
  - Outputs
  - Revisions to the Original
- Citation
- Acknowledgement

Overview

This is the PyTorch implementation for AAAI 2021 paper "Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification".
The original TensorFlow version is available in the "tensorflow" branch of this repository.
[arXiv] | [video] | [poster]

Code Architecture

.
├── backdoors             # Backdoor functions
├── configs               # Configurations
├── data                  # Data folder
│   └── trigger           # Trigger images (style images)
├──magenta_arbitrary-image-stylization-v1-256_2  # Model path of the style model
├── models                # Model architectures
│   ├── resnet.py         # ResNet models
│   ├── vgg.py            # VGG models
│   └── genr.py           # Trigger injection function
├── attack.py             # Data poisoning function
├── detoxification.py     # Detoxification function
├── main.py               # Main function
├── train.py              # Training function
└── utils.py              # Utility functions

Environments

# Create python environment (optional)
conda env create -f environment.yml
source activate dfst

Requirement

Please download the pre-trained model from the following link: Download Pre-trained Model

After downloading, unzip the file in the same directory. This will create a folder named ./magenta_arbitrary-image-stylization-v1-256_2 containing the pre-trained style transfer model.

Experiments

We provide example code snippets for CIFAR-10 and ResNet-18. These can be easily plugged in and modified in ./utils.py, specifically within the get_dataset(*) and get_model(*) functions.

Usage

Train a clean model.

python main.py --gpu 0 --attack clean

Train a model attacked by BadNets

python main.py --gpu 1 --attack badnet

Train a model attacked by DFST

python main.py --gpu 2 --attack dfst

Configurations

The specific DFST poisoning configurations can be found in ./configs/dfst.json. The configurations for clean (./configs/clean.json) and BadNets (./configs/badnet.json) are similar.

Hyperparameter	Default Value	Description
dataset	"cifar10"	The utilized dataset.
network	"resnet18"	The utilized model architecture.
seed	1024	Random seed for reproducibility.
batch_size	128	Batch size for training.
epochs	200	Total number of training epochs.
attack	"dfst"	Type of backdoor attack employed.
target	0	The attack target label.
poison_rate	0.05	Poisoning rate within a training batch.
style_model_path	"magenta_arbitrary-image-stylization-v1-256_2"	Path to the pre-trained style model.
alpha	0.6	Transparency parameter for poisoned images.
detox_flag	true	Indicates whether the detoxification process is applied.
detox_layers	["layer4"]	The selected layers for detoxfication.
detox_neuron_ratio	0.01	Ratio of compromised neurons identified in each layer.
detox_epochs	10	Number of epochs for training the feature injector.
w_ssim	0.1	Weight of the SSIM loss during feature injector training.
w_detox	0.3	Weight of the detoxing loss during feature injector training.

Outputs

A saved model path will be appended to the folder upon training completion. For example, after training a DFST-attacked model, the folder ./model_dir_dfst will be created. This folder will contain various data and logs.

./model_dir_dfst/model.pt: Model file.
./model_dir_dfst/poison_data.pt: Contains both training and testing poisoned data.
./model_dir_dfst/training.log: Logs of the training process.
./model_dir_dfst/visual_poison.png: Visualization of a few poisoned images.

Revisions to the Original

Replaced CycleGAN training with a pre-trained Arbitrary-Image-Stylization model for efficiency (./backdoors/dfst.py).
Dynamically adjusted intensity of style transfer using the transparency parameter (alpha).
Switched to a simplified Feature Injection function (./models/genr.py) over UNet to accelerate convergence.
Identified the top-ranked (1%) compromised neurons (./detoxification.py), moving away from the use of multiple thresholds.
Implemented the Detoxification process during each training epoch (./train.py).

Citation

Please cite our paper if you find it useful for your research.😀

@inproceedings{cheng2021deep,
  title={Deep feature space trojan attack of neural networks by controlled detoxification},
  author={Cheng, Siyuan and Liu, Yingqi and Ma, Shiqing and Zhang, Xiangyu},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={35},
  number={2},
  pages={1148--1156},
  year={2021}
}

DFST
DFST copied to clipboard

Metadata

Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification

Table of Contents

Overview

Code Architecture

Environments

Requirement

Experiments

Usage

Configurations

Outputs

Revisions to the Original

Citation

Acknowledgement

← Metadata

Owner

Metadata

DFST DFST copied to clipboard

Metadata

Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification

Table of Contents

Overview

Code Architecture

Environments

Requirement

Experiments

Usage

Configurations

Outputs

Revisions to the Original

Citation

Acknowledgement

← Metadata

Owner

Metadata

DFST
DFST copied to clipboard