adversarial-robustness-toolbox icon indicating copy to clipboard operation
adversarial-robustness-toolbox copied to clipboard

Double Softmax in PyTorch image estimator for test cases.

Open GiulioZizzo opened this issue 1 year ago • 2 comments

Many tests use the PyTorch image estimator defined in the test utils.

By default this estimator does not use logits, e.g. the function signature is:

get_image_classifier_pt(from_logits=False, load_init=True, use_maxpool=True)

However, the loss function is loss_fn = torch.nn.CrossEntropyLoss(reduction="sum")

torch.nn.CrossEntropyLoss by default expects logits and will re-apply a softmax: https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html

Hence, we should aim to make the default configuration mathematically correct. We could:

  1. Have the default from_logits=True
  2. Additionally the loss depend on from_logits=True/False by using either CrossEntropyLoss/NLLLoss

This may require updating certain ART tests.

System information (please complete the following information):

  • OS: MacOS
  • Python version: 3.9
  • ART version or commit number: 1.15
  • TensorFlow / Keras / PyTorch / MXNet version: Torch 1.13

GiulioZizzo avatar Jul 26 '23 16:07 GiulioZizzo

Hi @GiulioZizzo Thank you very much for raising this issue! Have you found any tests where the wrong value for from_logits has been used?

beat-buesser avatar Aug 07 '23 11:08 beat-buesser

Hi @beat-buesser ! So this issue is prevalent in the virtually all of ART test cases which use get_image_classifier_pt as they all use the default parameters as far as I can see. In many cases this doesn't cause a huge problem when the tests just do forward passes and then compute things like accuracy (the argmax would be unaffected).

However, it does start to cause problems when the neural network is trained and an exact result is expected. I came across the problem when refactoring test_adversarial_trainer for issue #2225 (including Huggingface support in ART). The Tensorflow and PyTorch models would train in a totally different manner even though they ought to converge to almost identical results (allowing for framework specific numerical deltas.) When you change the PyTorch classifier to use the correct logits/loss function combination the model then trains as it should and the framework results then match.

There could well be other tests that are affected by this, so could use investigating and correcting for current and future tests.

GiulioZizzo avatar Aug 14 '23 08:08 GiulioZizzo