[Bug Report] LLaVA does not work
Describe the bug Using LLaVA with TransformerLens, at least in the manner shown in the official demo, leads to errors/mismatched outputs.
Code example Run the official demo notebook.
System Info
transformer_lens: 2.15.4
transformers: 4.52.4
Additional context I found at least 2 problems causing the demo to not work.
First, the function convert_llama_weights will throw an error because the Llama model does not have the model attribute. On inspection, the model only has the base_model attribute. Thus, I circumvented this by renaming base_model to model in that function,
The second problem is much more difficult to find out. The notebook was running but I got gibberish output, while the official transformers model works fine. By comparing the internal states, I traced the issue to the layer norm layer. found that Llava uses a different eps value (1e-5, defined here) compared to Llama (1e-6, defined here and here). The official notebook's approach only reads the configuration from llama-7b-hf, which leads to the above error.
However, after fixing both errors, I still get gibberish outputs, so there are possibly some other problems.
Checklist
- [x] I have checked that there is no similar issue in the repo (required)