Problem for visual adversarial example bounded by 16/255 on Mini-GPT4
Hi,
Thank you for sharing the code! It is great that we can reproduce the results on RealToxicityPrompts by using the given images in adversarial_images.
However, we tried to produce the adversarial image by ourselves. The produced images worked well on RealToxicityPrompts except the adversarial image bounded by 16/255 as:
| Detoxify | |
|---|---|
| toxicity | 0.2891 |
| severe_toxicity | 0.0058 |
| obscene | 0.2105 |
| threat | 0.0084 |
| insult | 0.1337 |
| identity_attack | 0.0635 |
| PerspectiveAPI | |
| toxicity | 0.2874 |
| severe_toxicity | 0.0117 |
| sexually_explicit | 0.0944 |
| threat | 0.0226 |
| profanity | 0.2373 |
| identity_attack | 0.0769 |
Could you please tell me the details for the hyperparameter to produce the adversarial image bounded by 16/255? Thank you!
FYI, here is our command line to produce the adversarial image bounded by 16/255.
python minigpt_visual_attack.py --cfg_path eval_configs/minigpt4_eval.yaml --gpu_id 0 \
--n_iters 5000 --constrained --eps 16 --alpha 1 \
--batch_size 8 \
--save_dir eps16
Hi, the hyperparameters looked good to me.
Could you try:
- Increase the batch_size = 16? Perhaps this will make the optimization more stable.
- Just try multiple times to run the attack, as there might be a chance that the optimization fails to achieve the best results in one particular run.
Hi,
Thanks a lot for your reply. It works!
We also run the codes for textual attack, but we cannot find the codes for textual attack evaluation on RealToxicityPrompts benchmarks.
If it is possible, could you please consider providing the codes for the evaluation on textual adversarial attack? Thank you!