Visual-Adversarial-Examples-Jailbreak-Large-Language-Models icon indicating copy to clipboard operation
Visual-Adversarial-Examples-Jailbreak-Large-Language-Models copied to clipboard

about the format of images

Open payphone131 opened this issue 1 year ago • 3 comments

hello i found that in your code you save images as '.bmp'. i changed the code to save images as '.jpg' and found minigpt4 said the saved adversarial images are blurred and pixelated, which suggests these adversarial images in '.jpg' format were treated as pure noise with no semantics. but when i used your original code which saves images as '.bmp', the minigpt4 was jailbroken and able to reply with the adversarial images as intended. i am wondering why minigpt4 replied differently when saving images in different formats.

payphone131 avatar Aug 20 '24 03:08 payphone131

Hi, if you use .bmp, the image file just saves exactly what the image originally was. If you use .jpg, the image will be going through a lossy compression, and thus the saved pixel values won't be exactly the same as what you have gotten via the adversarial optimization.

Unispac avatar Aug 20 '24 07:08 Unispac

thanks, your reply really helps.

payphone131 avatar Aug 20 '24 07:08 payphone131

btw, have you tested the generated adversarial examples on some commercial models? i tested the generated adversarial examples(uncontrained) on GPT-4V, and i found GPT-4V took them as random noisy images with no semantics. do you think this indicates that GPT-4V has some defense against adversarial examples?

payphone131 avatar Aug 22 '24 03:08 payphone131