Visual-Adversarial-Examples-Jailbreak-Large-Language-Models icon indicating copy to clipboard operation
Visual-Adversarial-Examples-Jailbreak-Large-Language-Models copied to clipboard

computing gradient

Open ChoiDae1 opened this issue 1 year ago • 1 comments

Hello. I'm interested in your work and have a question about your implementation. When implementing PGD, I noticed that your code uses only the gradient sign, not the gradient value, as shown in the following line: adv_noise.data = (adv_noise.data - alpha * adv_noise.grad.detach().sign()).clamp(-epsilon, epsilon) in visual_attacker.py Is there a specific reason for this?

ChoiDae1 avatar Aug 20 '24 09:08 ChoiDae1

Hi, this is a commonly used trick back to https://arxiv.org/abs/1412.6572 My experience is that gradient sign descent usually gives better results, less likely to be stuck at a local optimal.

Unispac avatar Aug 21 '24 16:08 Unispac