adversarial-robustness-toolbox
adversarial-robustness-toolbox copied to clipboard
PatchFool implementation
Description
Initial draft implementation of PatchFool attack from the paper:
Patch-Fool: Are Vision Transformers Always Robust Against Adversarial Perturbations?
Currently there is an example notebook of the attack in colab. I do plan to contribute the notebook too once ready.
Fixes # (issue)
Type of change
Please check all relevant options.
- [ ] Improvement (non-breaking)
- [ ] Bug fix (non-breaking)
- [x] New feature (non-breaking)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] This change requires a documentation update
Testing
Please describe the tests that you ran to verify your changes. Consider listing any relevant details of your test configuration.
- [ ] Test A
- [ ] Test B
Test Configuration:
- OS
- Python version
- ART version or commit number
- TensorFlow / Keras / PyTorch / MXNet version
Checklist
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my own code
- [ ] I have commented my code
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] New and existing unit tests pass locally with my changes
Codecov Report
Attention: 21 lines
in your changes are missing coverage. Please review.
Comparison is base (
3de2078
) 85.08% compared to head (da05de1
) 85.16%.
:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@ Coverage Diff @@
## main #2163 +/- ##
==========================================
+ Coverage 85.08% 85.16% +0.07%
==========================================
Files 324 325 +1
Lines 29331 29480 +149
Branches 5409 5431 +22
==========================================
+ Hits 24956 25106 +150
+ Misses 2997 2973 -24
- Partials 1378 1401 +23
Files | Coverage Δ | |
---|---|---|
art/attacks/evasion/__init__.py | 98.24% <100.00%> (+0.03%) |
:arrow_up: |
art/estimators/pytorch.py | 84.73% <76.92%> (-0.99%) |
:arrow_down: |
art/attacks/evasion/patchfool.py | 86.76% <86.76%> (ø) |
This is only a draft implementations but I wanted to discuss a few issues that I am facing.
The first one comes from getting the attention weights of a transformer model. I added one implementation for the ViT model that comes pre-trained from the torchvision
models library (the papers' authors use DeiT but I am more familiar with the architecture of this one). The problem I see is that it is very challenging to implement one common method that extracts the weights even for different implementations of the same model architecture. Extracting the weights in my case required tracing the graph and even changing one of the operations.
One way to go would be to provide a classifier working only for one specific model. Otherwise I imagine this can work if the user of ART who provides the model provides also the method to extract the weights and ART could provide an abstraction class and an example? But there could be a better option that I cannot see right now?
Second issue is that the PyTorch model I used behaves incorrectly if the benign input is cast to float ... which makes it hard to test the attack. (there's an example in the attack's notebook ). Is this a problem coming from the mixture of frameworks? Have you seen such behaviour before?
Hi @sechkova Thank you very much for your pull request!
I agree about your first question that general support for all possible architectures is challenging or not reasonably possible. ART does have multiple model specific estimators, for example art.estimators.object_detection.PyTorchYolo
, that are easier to implement and maintain. I think this approach would be the best for your PR too.
About your second question, does the model you are working with expect integer arrays as input? If yes, you could accepts float arrays as input to your new ART tools to follow the ART APIs and inside of the tools convert them to integer arrays before providing the input data to the model. We would have to investigate how this conversion affects the adversarial attacks.
About your second question, does the model you are working with expect integer arrays as input? If yes, you could accepts float arrays as input to your new ART tools to follow the ART APIs and inside of the tools convert them to integer arrays before providing the input data to the model. We would have to investigate how this conversion affects the adversarial attacks.
At the end I used convert_image_dtype
from PyTorch which both converts and scales the values and now the model works properly. I couldn't figure out how the other attacks' implementations are able to handle this.
I agree about your first question that general support for all possible architectures is challenging or not reasonably possible. ART does have multiple model specific estimators, for example
art.estimators.object_detection.PyTorchYolo
, that are easier to implement and maintain. I think this approach would be the best for your PR too.
For now I added art.estimators.classification.PyTorchDeiT
but the way I've hardcoded the attention layers works I think with either PyTorch < 2.0 or with setting 'TIMM_FUSED_ATTN' = '0'
@beat-buesser the PR is updated and the attack algorithm now shows good results. Can you do an initial review?
What I think is still to be resolved is the custom PyTorch DeiT classifier. For now I have implemented just the very basics for the attack to work with a pre-trained one from timm . It involves hardcoding the layers names, therefore there is a difference between PyTorch versions, which I've circumvented by setting 'TIMM_FUSED_ATTN' = '0' (you can see the example notebook below). It is not a very subtle approach for sure.
Here is an example notebook that I wish to contribute once the implementation is finalised: https://colab.research.google.com/drive/1QfdZEUI0hhO-AYFL12RZvB0dA95l2NAS?usp=sharing
Hi @sechkova Thank you very much for implementing the PatchFool attack in ART! I have added a few comments in my review, please take a look and let me know what you think. In addition to that could you please add a unit test in pytest format for the new attack class and a notebook showing how the implementation reproduces the original paper?
@beat-buesser Can you advise how should the tests be defined? PatchFool attack works on transformer models, using information from the attention layers to calculate the attack. I can use a downloaded pre-trained model for the tests but they are usually trained on ImageNet while the tests in ART use other smaller test datasets. This causes issues with the number of classes etc.
I added one initial draft test with the last commit (https://github.com/Trusted-AI/adversarial-robustness-toolbox/pull/2163/commits/da05de1adfea1f2d3fd23650536c0faf7c540468).