[Proposal] Expand quantization model support
Why does Transformer Lens only support quantized LLaMA models?
Hi everyone,
I'm trying to use the transformer_lens library to study the activations of a quantized Mistral 7B model (unsloth/mistral-7b-instruct-v0.2-bnb-4bit). However, when I try to load it, I encounter a problem.
This is the code I'm using:
model_merged = model.merge_and_unload()
model_hooked = transformer_lens.HookedTransformer.from_pretrained(
"unsloth/mistral-7b-instruct-v0.2-bnb-4bit",
hf_model=model_merged,
hf_model_4bit=True,
fold_ln=False,
fold_value_biases=False,
center_writing_weights=False,
center_unembed=False,
tokenizer=tokenizer
)
The problem is that I get an assertion error stating that only LLaMA models can be used in quantized format with this library. This is the error message I receive:
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
AssertionError: Quantization is only supported for Llama models
I find it illogical and frustrating that only LLaMA models are compatible with transformer_lens in quantized format. Can anyone explain why this decision was made? Is there a technical reason behind this or any way to work around this issue so that I can use my Mistral 7B model?
I appreciate any guidance or solutions you can provide.
Thanks!
It was not done due to the person who added it being a volunteer. We can definitely put expanding this on the list of todos.