adversarial-robustness-toolbox Bug in Activation Defence for PyTorch

Describe the bug The Activation Defense part (art.defences.detector.poison.activation_defence.py) can not get the values before the last layer correctly

To Reproduce

from art.defences.detector.poison.activation_defence import ActivationDefence
from art.estimators.classification.pytorch import PyTorchClassifier
from torch.utils.data import DataLoader
import torch
import torchvision
import numpy as np
import logging
#cpu and smaller batch_size is also fine, does not change the result
#rounds number does not change the result either,
device = "cuda"
batch_size = 1024
rounds = 3

logging.basicConfig(level=logging.WARNING)  #Turn on warnings

#Get MNIST Dataset
mnist_train = torchvision.datasets.MNIST(root=".", train=True, download=True,
                                         transform=torchvision.transforms.ToTensor())
mnist_test = torchvision.datasets.MNIST(root=".", train=False, download=True,
                                        transform=torchvision.transforms.ToTensor())

#Some net, could be any Sequential net.
m = torch.nn.Sequential(
    torch.nn.Conv2d(1, 16, 3, 1, 1),
    torch.nn.ReLU(),
    torch.nn.Conv2d(16, 16, 3, 1, 1),
    torch.nn.ReLU(),
    torch.nn.MaxPool2d(2, 2, 0),
    torch.nn.Conv2d(16, 64, 3, 1, 1),
    torch.nn.ReLU(),
    torch.nn.Conv2d(64, 64, 3, 1, 1),
    torch.nn.ReLU(),
    torch.nn.MaxPool2d(2, 2, 0),
    torch.nn.Flatten(),
    torch.nn.Linear(3136, 512),
    torch.nn.ReLU(),
    torch.nn.Linear(512, 512),
    torch.nn.ReLU(),
    torch.nn.Linear(512, 10),
)
m = m.to(device)
m = m.train()

#define loss, dataloaders, and optimizers
loss = torch.nn.CrossEntropyLoss()
train_iter = DataLoader(dataset=mnist_train, batch_size=batch_size)
test_iter = DataLoader(dataset=mnist_test, batch_size=batch_size)
optimizer = torch.optim.Adam(m.parameters(), 0.005)
#start training
for i in range(rounds):
    for X, y in train_iter:
        X = X.to(device)
        y = y.to(device)
        l = loss(m(X), y)
        optimizer.zero_grad()
        l.backward()
        optimizer.step()

#check the accuracy
m.eval()
acc_sum, n = 0.0, 0
with torch.no_grad():
    for X, y in test_iter:
        acc_sum += (m(X.to(device)).argmax(dim=1) == y.to(device)).float().sum().item()
        n += y.shape[0]
print(acc_sum / n)  #should be something around 98, higher or lower does not change the result

#Stack the data as described in PyTorchClassifier
xs = []
ys = []
for i in range(len(mnist_test)):
    x, y = mnist_test[i]
    x = x.to("cpu").unsqueeze(0).detach().numpy()
    xs.append(x)
    ys.append(y)
xs = np.stack(xs, axis=1)[0]# 10000,1,28,28
ys = np.stack(ys, axis=0)# 10000

#Creating corresponding PytorchClassifier
pt = PyTorchClassifier(m, loss=loss,
                       input_shape=(1, 28, 28), nb_classes=10,
                       clip_values=(0, 1),
                       preprocessing=(np.array([xs.mean()]), np.array([xs.std()])),
                       optimizer=optimizer
                       )
ad = ActivationDefence(pt, xs, ys)
print(ad.detect_poison())

Just run the code above, and you will see the warning: WARNING:art.defences.detector.poison.activation_defence:Number of activations in last hidden layer is too small. Method may not work properly. Size: 10

which means, it is getting the result AFTER running the fc layer, while in the paper, it seems like we should need the result BEFORE the paper (also it is why there is a warning there, since usually, the hidden units before the fc layer is more than 32)

Expected behavior the result BEFORE the fc layer was used in clustering A easy fix could be modifying the 586 line of activation_defences to protected_layer=nb_layers-2, but I am not sure will it influence others

Screenshots

System information (please complete the following information):

OS=Windows 10
Python version=3.9.12
ART version or commit number=1.10.3
PyTorch version=1.11.0

Aug 05 '22 12:08 XaiverYuan

Hi @XaiverYuan Thank you very much for exploring ART and reporting this issue! We'll take a closer look as soon as possible. I think your proposal makes sense for the model in your example.

Aug 08 '22 21:08 beat-buesser

Hi @XaiverYuan We have found experimentally that depending on the type of neural network architecture, more than the last layer before softmax of the network needs to be included into the analysis. This is in particular true when the last layer has few neurons as described by the warning you are seeing. ART code only includes the last layer in the analysis. I would suggest playing with more than one layer. Good luck!

Paper link: https://arxiv.org/pdf/1811.03728.pdf

Aug 26 '22 17:08 Nathalie-B

Hi im wondering can i run this defense on FasterRCNN pytorch

Apr 12 '23 06:04 QuangNguyen2609

adversarial-robustness-toolbox adversarial-robustness-toolbox copied to clipboard

Bug in Activation Defence for PyTorch

adversarial-robustness-toolbox
adversarial-robustness-toolbox copied to clipboard