[Bug Report] LLaVA does not work

Open nhatkhtn opened this issue 7 months ago • 3 comments

Describe the bug Using LLaVA with TransformerLens, at least in the manner shown in the official demo, leads to errors/mismatched outputs.

Code example Run the official demo notebook.

System Info transformer_lens: 2.15.4 transformers: 4.52.4

Additional context I found at least 2 problems causing the demo to not work.

First, the function convert_llama_weights will throw an error because the Llama model does not have the model attribute. On inspection, the model only has the base_model attribute. Thus, I circumvented this by renaming base_model to model in that function,

The second problem is much more difficult to find out. The notebook was running but I got gibberish output, while the official transformers model works fine. By comparing the internal states, I traced the issue to the layer norm layer. found that Llava uses a different eps value (1e-5, defined here) compared to Llama (1e-6, defined here and here). The official notebook's approach only reads the configuration from llama-7b-hf, which leads to the above error.

However, after fixing both errors, I still get gibberish outputs, so there are possibly some other problems.

Checklist

[x] I have checked that there is no similar issue in the repo (required)

Jun 11 '25 23:06 nhatkhtn