Visual-Adversarial-Examples-Jailbreak-Large-Language-Models icon indicating copy to clipboard operation
Visual-Adversarial-Examples-Jailbreak-Large-Language-Models copied to clipboard

Issues while trying to reproduce the results on LLaVA-v1.5

Open simplelifetime opened this issue 2 years ago • 13 comments

Thanks for your excellent work! I'm trying to reproduce this method on LLaVA-v1.5 model. But I've encounted one problem:

File ~/anaconda3/envs/llava/lib/python3.10/site-packages/torch/autograd/init.py:200, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs) 195 retain_graph = create_graph 197 # The reason we repeat same the comment below is that 198 # some Python versions print out the first line of a multi-line function 199 # calls in the traceback and some print out the last line --> 200 Variable.execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 201 tensors, grad_tensors, retain_graph, create_graph, inputs, 202 allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

What is the most probable reason that leads to such mistakes? I'm a little bit unfamiliar with adversarial training, hope you can provide some helps. Thanks!

simplelifetime avatar Nov 14 '23 03:11 simplelifetime

Hey,

Based on the log, it seems that you are computing gradients for those tensors that do not set the require grad flags. Could you provide more information about how you run the code and how you get into this error? Otherwise, it would be difficult to conclude the reasons.

Thanks!

Unispac avatar Nov 21 '23 07:11 Unispac

I also tried llava-1.5 and got the same error. Following suggestions online I added

model.enable_input_require_grads()

after loading the model, which resolved this issue.

However, the visual attack code still fails as the adv_noise.grad fields are still not populated on the after the call to target_loss.backward() - which seems to indicate that the gradients are not propagating back to the image inputs.

dribnet avatar Nov 28 '23 14:11 dribnet

Hi all,

Could you try this checkpoint? https://huggingface.co/liuhaotian/llava-llama-2-13b-chat-lightning-preview

I checked the llava repository, llava-1.5 was released on Oct 5, which was after the publication of our paper. So, it is likely that checkpoint is not compatible with the older version of llava codes that we curate in this repository.

Sorry for the confusion.

Unispac avatar Nov 29 '23 05:11 Unispac

Thanks Xiangyu - yes I can confirm that the liuhaotian/llava-llama-2-13b-chat-lightning-preview checkpoint you suggest works well with the codebase as-is.

I also have made progress in adapting the code to work with the latest v1.5 models which have a number of improvements such as handling larger 336x336px inputs. For example, here's a (harmless) output done with the liuhaotian/llava-v1.5-7b model.

bad_prompt

But there's some remaining issues to resolve in loading these newer models correctly - happy to share notes if anyone else is working on this.

dribnet avatar Dec 02 '23 15:12 dribnet

Reference in

Thanks Xiangyu - yes I can confirm that the liuhaotian/llava-llama-2-13b-chat-lightning-preview checkpoint you suggest works well with the codebase as-is.

I also have made progress in adapting the code to work with the latest v1.5 models which have a number of improvements such as handling larger 336x336px inputs. For example, here's a (harmless) output done with the liuhaotian/llava-v1.5-7b model.

bad_prompt

But there's some remaining issues to resolve in loading these newer models correctly - happy to share notes if anyone else is working on this.

How do you address the promble that the adv_noise.grad is None with the liuhaotian/llava-v1.5-7b model? Thanks a lot!

rain305f avatar Jan 19 '24 12:01 rain305f

The reason of None adv_noise.grad is that, LLaVA-1.5 by default uses @ torch.no_grad() when using the CLIP vision encoder, commenting off this line (llava/models/multimodal_encoder/clip_encoder/line39) should work.

YitingQu avatar Feb 24 '24 09:02 YitingQu

@YitingQu thank you!

RylanSchaeffer avatar Feb 28 '24 17:02 RylanSchaeffer

@YitingQu Does one need to re-install Llava with pip after commenting out that line?

Edit: Answer: no.

RylanSchaeffer avatar Feb 28 '24 17:02 RylanSchaeffer

Is the question fixed? After comment out @ torch.no_grad(), I still get the error message like this:

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

when using the model 'liuhaotian/llava-llama-2-13b-chat-lightning-preview'. It seems that the gradient can not backpropagate to the adv_noise. How can I deal with this?

rookiehb avatar Sep 03 '24 22:09 rookiehb

I switched to a different code base: https://github.com/RylanSchaeffer/AstraFellowship-When-Do-VLM-Image-Jailbreaks-Transfer

This allowed me to easily optimize the adversarial noise

RylanSchaeffer avatar Sep 03 '24 22:09 RylanSchaeffer

Thank you, Rylan, for helping answer questions & the continuous development of the codebase along this thread!

Unispac avatar Sep 03 '24 22:09 Unispac

I switched to a different code base: https://github.com/RylanSchaeffer/AstraFellowship-When-Do-VLM-Image-Jailbreaks-Transfer

This allowed me to easily optimize the adversarial noise

Can you elaborate how did you switched to a different codebase? I did look at that link https://github.com/RylanSchaeffer/AstraFellowship-When-Do-VLM-Image-Jailbreaks-Transfer but this refers to some other jailbreak method altogether i suppose.

netgvarun2012 avatar Mar 11 '25 08:03 netgvarun2012

Maybe what I didn't isn't applicable to what you're doing. Feel free to ignore my comment!

RylanSchaeffer avatar Mar 11 '25 13:03 RylanSchaeffer