unsloth
unsloth copied to clipboard
Download and saving unsloth/gemma-7b-bnb-4bit to a local folder loses parameters
First load model with internet connection ON model, tokenizer = FastLanguageModel.from_pretrained( model_name = "unsloth/gemma-7b-bnb-4bit", max_seq_length = max_seq_length, dtype = dtype, load_in_4bit = load_in_4bit, ) Then I saved the model, using local_path = "***" mode.save_pretrained(local_path) tokenizer.save_pretrained(local_path)
Then make a dataset, then try to use the model locally with internet connection OFF
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "****",
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
)
​And got the following error:
ValueError: Supplied state dict for model.layers.23.mlp.down_proj.weight does not contain bitsandbytes__*
and possibly other quantized_stats
components.
Is this the correct model after downloading from Huggingface?
model GemmaForCausalLM( (model): GemmaModel( (embed_tokens): Embedding(256000, 3072) (layers): ModuleList( (0-27): 28 x GemmaDecoderLayer( (self_attn): GemmaSdpaAttention( (q_proj): Linear4bit(in_features=3072, out_features=4096, bias=False) (k_proj): Linear4bit(in_features=3072, out_features=4096, bias=False) (v_proj): Linear4bit(in_features=3072, out_features=4096, bias=False) (o_proj): Linear4bit(in_features=4096, out_features=3072, bias=False) (rotary_emb): GemmaFixedRotaryEmbedding() ) (mlp): GemmaMLP( (gate_proj): Linear4bit(in_features=3072, out_features=24576, bias=False) (up_proj): Linear4bit(in_features=3072, out_features=24576, bias=False) (down_proj): Linear4bit(in_features=24576, out_features=3072, bias=False) (act_fn): PytorchGELUTanh() ) (input_layernorm): GemmaRMSNorm() (post_attention_layernorm): GemmaRMSNorm() ) ) (norm): GemmaRMSNorm() ) (lm_head): Linear(in_features=3072, out_features=256000, bias=False) )
down_proj = model.model.layers[0].mlp.down_proj print(down_proj) Linear4bit(in_features=24576, out_features=3072, bias=False)
@patrickjchen So ur using the Kaggle notebook here https://www.kaggle.com/code/danielhanchen/kaggle-gemma-7b-unsloth-notebook/ right?
I'm uncertain on internet connections and stuff sadly - not a Kaggle expert :(
Dan, seems the implementation of save_pretrained/from_pretrained have some issues for the Gemma 7b model. My code wroks for Mistral. However, I felt Gemma version is a lot better.
and from the code: python3.11/site-packages/transformers/quantizers/quantizer_bnb_4bit.py", line 193:
if (param_name + ".quant_state.bitsandbytes__fp4" not in state_dict) and (
param_name + ".quant_state.bitsandbytes__nf4" not in state_dict
):
raise ValueError(
f"Supplied state dict for {param_name} does not contain bitsandbytes__*
and possibly other quantized_stats
components."
)
Seems it implies that there shall be names like bitsandbytes__nf4/bitsandbytes__fp4 But the names in the unsloth code are cdequantize_blockwise_fp16_nf4/cdequantize_blockwise_bf16_nf4
Seems some keys are lost after reading back (from_pretrained()). There were 1234 keys, but after store/retrieve, there were only 1050 keys. For the top level state_dict
@danielhanchen Hi Dan, after I did model, tokenizer = FastLanguageModel.from_pretrained( model_name = "****", max_seq_length = max_seq_length, dtype = dtype, load_in_4bit = load_in_4bit, ) How to save the model to a local folder? I tried model.save_pretrained(), model.save_pretrained_merged() and unsloth_save_model(), none of them work. And also the model above is just GemmaForCausalLM, not PeftModelForCausalLM. And my observation is that saving model is lossing parameters (1234--->1050)
@patrickjchen Ok Ill take a look
Once I get the model from the internet and save it locally, I change my config.json to have the local model path and then when I try to reload it from local memory I get the following error even though I have bitsandbytes (version 0.43.1) installed. Is there a workaround for this?:
"You have a version of bitsandbytes
that is not compatible with 4bit inference and training"
ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True
and pass a custom device_map
to from_pretrained
. Check https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu for more details.
@alarecha24 That's a weird error msg - is this for Gemma? What's your GPU?
I'm attempting to fine tune a locally saved model and running into the same issue. My GPU info is below: NVIDIA-SMI 535.161.08 Driver Version: 535.161.08 CUDA Version: 12.2 Tesla T4
@nick-gt Are you using load_in_4bit = True
?
@danielhanchen yes, I've tried using both true and false.
Hello everyone, I am running in the same issue :
"ValueError: Supplied state dict for model.layers.28.mlp.gate_proj.weight does not contain bitsandbytes__*
and possibly other quantized_stats
components."
I am trying to finetune the model "unsloth/codellama-13b-bnb-4bit" using the FastLanguageModel.from_pretrained() method, but it does not even seem to even pull the model since the error happens right after the shards are starting to be downloaded.
GPU : NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0 Torch : 2.3.0
Here is the code snippet :
max_seq_length = 2048
dtype = None
load_in_4bit = True
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = model_name,
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
)
Thanks in advance for the help!
@jgarcia2809 I just reuploaded Codellama-13b - hopefully it works now
Hi @danielhanchen thank you for the quick response. Unfortunately it still gives me the same error.
I tried with the "unsloth/llama-3-8b-Instruct-bnb-4bit" and could finetune it using the same code. It did not produced any errors.
Ok that's very weird - I'll see what I can do - temporarily best to use unsloth/llama-3.1-8b
- another approach is to uninstall unsloth then reinstall it