adversarial-robustness-toolbox icon indicating copy to clipboard operation
adversarial-robustness-toolbox copied to clipboard

PatchFool implementation

Open sechkova opened this issue 1 year ago • 7 comments

Description

Initial draft implementation of PatchFool attack from the paper:

Patch-Fool: Are Vision Transformers Always Robust Against Adversarial Perturbations?

Currently there is an example notebook of the attack in colab. I do plan to contribute the notebook too once ready.

Fixes # (issue)

Type of change

Please check all relevant options.

  • [ ] Improvement (non-breaking)
  • [ ] Bug fix (non-breaking)
  • [x] New feature (non-breaking)
  • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [ ] This change requires a documentation update

Testing

Please describe the tests that you ran to verify your changes. Consider listing any relevant details of your test configuration.

  • [ ] Test A
  • [ ] Test B

Test Configuration:

  • OS
  • Python version
  • ART version or commit number
  • TensorFlow / Keras / PyTorch / MXNet version

Checklist

  • [ ] My code follows the style guidelines of this project
  • [ ] I have performed a self-review of my own code
  • [ ] I have commented my code
  • [ ] I have made corresponding changes to the documentation
  • [ ] My changes generate no new warnings
  • [ ] I have added tests that prove my fix is effective or that my feature works
  • [ ] New and existing unit tests pass locally with my changes

sechkova avatar May 24 '23 11:05 sechkova

Codecov Report

Attention: 21 lines in your changes are missing coverage. Please review.

Comparison is base (3de2078) 85.08% compared to head (da05de1) 85.16%.

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2163      +/-   ##
==========================================
+ Coverage   85.08%   85.16%   +0.07%     
==========================================
  Files         324      325       +1     
  Lines       29331    29480     +149     
  Branches     5409     5431      +22     
==========================================
+ Hits        24956    25106     +150     
+ Misses       2997     2973      -24     
- Partials     1378     1401      +23     
Files Coverage Δ
art/attacks/evasion/__init__.py 98.24% <100.00%> (+0.03%) :arrow_up:
art/estimators/pytorch.py 84.73% <76.92%> (-0.99%) :arrow_down:
art/attacks/evasion/patchfool.py 86.76% <86.76%> (ø)

... and 12 files with indirect coverage changes

codecov-commenter avatar May 24 '23 11:05 codecov-commenter

This is only a draft implementations but I wanted to discuss a few issues that I am facing.

The first one comes from getting the attention weights of a transformer model. I added one implementation for the ViT model that comes pre-trained from the torchvision models library (the papers' authors use DeiT but I am more familiar with the architecture of this one). The problem I see is that it is very challenging to implement one common method that extracts the weights even for different implementations of the same model architecture. Extracting the weights in my case required tracing the graph and even changing one of the operations. One way to go would be to provide a classifier working only for one specific model. Otherwise I imagine this can work if the user of ART who provides the model provides also the method to extract the weights and ART could provide an abstraction class and an example? But there could be a better option that I cannot see right now?

Second issue is that the PyTorch model I used behaves incorrectly if the benign input is cast to float ... which makes it hard to test the attack. (there's an example in the attack's notebook ). Is this a problem coming from the mixture of frameworks? Have you seen such behaviour before?

sechkova avatar May 24 '23 11:05 sechkova

Hi @sechkova Thank you very much for your pull request!

I agree about your first question that general support for all possible architectures is challenging or not reasonably possible. ART does have multiple model specific estimators, for example art.estimators.object_detection.PyTorchYolo, that are easier to implement and maintain. I think this approach would be the best for your PR too.

About your second question, does the model you are working with expect integer arrays as input? If yes, you could accepts float arrays as input to your new ART tools to follow the ART APIs and inside of the tools convert them to integer arrays before providing the input data to the model. We would have to investigate how this conversion affects the adversarial attacks.

beat-buesser avatar May 26 '23 13:05 beat-buesser

About your second question, does the model you are working with expect integer arrays as input? If yes, you could accepts float arrays as input to your new ART tools to follow the ART APIs and inside of the tools convert them to integer arrays before providing the input data to the model. We would have to investigate how this conversion affects the adversarial attacks.

At the end I used convert_image_dtype from PyTorch which both converts and scales the values and now the model works properly. I couldn't figure out how the other attacks' implementations are able to handle this.

sechkova avatar Jul 17 '23 08:07 sechkova

I agree about your first question that general support for all possible architectures is challenging or not reasonably possible. ART does have multiple model specific estimators, for example art.estimators.object_detection.PyTorchYolo, that are easier to implement and maintain. I think this approach would be the best for your PR too.

For now I added art.estimators.classification.PyTorchDeiT but the way I've hardcoded the attention layers works I think with either PyTorch < 2.0 or with setting 'TIMM_FUSED_ATTN' = '0'

sechkova avatar Jul 17 '23 08:07 sechkova

@beat-buesser the PR is updated and the attack algorithm now shows good results. Can you do an initial review?

What I think is still to be resolved is the custom PyTorch DeiT classifier. For now I have implemented just the very basics for the attack to work with a pre-trained one from timm . It involves hardcoding the layers names, therefore there is a difference between PyTorch versions, which I've circumvented by setting 'TIMM_FUSED_ATTN' = '0' (you can see the example notebook below). It is not a very subtle approach for sure.

Here is an example notebook that I wish to contribute once the implementation is finalised: https://colab.research.google.com/drive/1QfdZEUI0hhO-AYFL12RZvB0dA95l2NAS?usp=sharing

sechkova avatar Aug 25 '23 14:08 sechkova

Hi @sechkova Thank you very much for implementing the PatchFool attack in ART! I have added a few comments in my review, please take a look and let me know what you think. In addition to that could you please add a unit test in pytest format for the new attack class and a notebook showing how the implementation reproduces the original paper?

@beat-buesser Can you advise how should the tests be defined? PatchFool attack works on transformer models, using information from the attention layers to calculate the attack. I can use a downloaded pre-trained model for the tests but they are usually trained on ImageNet while the tests in ART use other smaller test datasets. This causes issues with the number of classes etc.

I added one initial draft test with the last commit (https://github.com/Trusted-AI/adversarial-robustness-toolbox/pull/2163/commits/da05de1adfea1f2d3fd23650536c0faf7c540468).

sechkova avatar Dec 22 '23 17:12 sechkova