unsloth icon indicating copy to clipboard operation
unsloth copied to clipboard

Download and saving unsloth/gemma-7b-bnb-4bit to a local folder loses parameters

Open patrickjchen opened this issue 10 months ago • 16 comments

First load model with internet connection ON model, tokenizer = FastLanguageModel.from_pretrained( model_name = "unsloth/gemma-7b-bnb-4bit", max_seq_length = max_seq_length, dtype = dtype, load_in_4bit = load_in_4bit, ) Then I saved the model, using local_path = "***" mode.save_pretrained(local_path) tokenizer.save_pretrained(local_path)

Then make a dataset, then try to use the model locally with internet connection OFF model, tokenizer = FastLanguageModel.from_pretrained( model_name = "****", max_seq_length = max_seq_length, dtype = dtype, load_in_4bit = load_in_4bit, ) ​And got the following error: ValueError: Supplied state dict for model.layers.23.mlp.down_proj.weight does not contain bitsandbytes__* and possibly other quantized_stats components.

patrickjchen avatar Apr 12 '24 21:04 patrickjchen

Is this the correct model after downloading from Huggingface?

model GemmaForCausalLM( (model): GemmaModel( (embed_tokens): Embedding(256000, 3072) (layers): ModuleList( (0-27): 28 x GemmaDecoderLayer( (self_attn): GemmaSdpaAttention( (q_proj): Linear4bit(in_features=3072, out_features=4096, bias=False) (k_proj): Linear4bit(in_features=3072, out_features=4096, bias=False) (v_proj): Linear4bit(in_features=3072, out_features=4096, bias=False) (o_proj): Linear4bit(in_features=4096, out_features=3072, bias=False) (rotary_emb): GemmaFixedRotaryEmbedding() ) (mlp): GemmaMLP( (gate_proj): Linear4bit(in_features=3072, out_features=24576, bias=False) (up_proj): Linear4bit(in_features=3072, out_features=24576, bias=False) (down_proj): Linear4bit(in_features=24576, out_features=3072, bias=False) (act_fn): PytorchGELUTanh() ) (input_layernorm): GemmaRMSNorm() (post_attention_layernorm): GemmaRMSNorm() ) ) (norm): GemmaRMSNorm() ) (lm_head): Linear(in_features=3072, out_features=256000, bias=False) )

down_proj = model.model.layers[0].mlp.down_proj print(down_proj) Linear4bit(in_features=24576, out_features=3072, bias=False)

patrickjchen avatar Apr 13 '24 13:04 patrickjchen

@patrickjchen So ur using the Kaggle notebook here https://www.kaggle.com/code/danielhanchen/kaggle-gemma-7b-unsloth-notebook/ right?

I'm uncertain on internet connections and stuff sadly - not a Kaggle expert :(

danielhanchen avatar Apr 13 '24 14:04 danielhanchen

Dan, seems the implementation of save_pretrained/from_pretrained have some issues for the Gemma 7b model. My code wroks for Mistral. However, I felt Gemma version is a lot better.

patrickjchen avatar Apr 13 '24 17:04 patrickjchen

and from the code: python3.11/site-packages/transformers/quantizers/quantizer_bnb_4bit.py", line 193: if (param_name + ".quant_state.bitsandbytes__fp4" not in state_dict) and ( param_name + ".quant_state.bitsandbytes__nf4" not in state_dict ): raise ValueError( f"Supplied state dict for {param_name} does not contain bitsandbytes__* and possibly other quantized_stats components." )

Seems it implies that there shall be names like bitsandbytes__nf4/bitsandbytes__fp4 But the names in the unsloth code are cdequantize_blockwise_fp16_nf4/cdequantize_blockwise_bf16_nf4

patrickjchen avatar Apr 13 '24 18:04 patrickjchen

Seems some keys are lost after reading back (from_pretrained()). There were 1234 keys, but after store/retrieve, there were only 1050 keys. For the top level state_dict

patrickjchen avatar Apr 13 '24 19:04 patrickjchen

@danielhanchen Hi Dan, after I did model, tokenizer = FastLanguageModel.from_pretrained( model_name = "****", max_seq_length = max_seq_length, dtype = dtype, load_in_4bit = load_in_4bit, ) How to save the model to a local folder? I tried model.save_pretrained(), model.save_pretrained_merged() and unsloth_save_model(), none of them work. And also the model above is just GemmaForCausalLM, not PeftModelForCausalLM. And my observation is that saving model is lossing parameters (1234--->1050)

patrickjchen avatar Apr 14 '24 01:04 patrickjchen

@patrickjchen Ok Ill take a look

danielhanchen avatar Apr 14 '24 06:04 danielhanchen

Once I get the model from the internet and save it locally, I change my config.json to have the local model path and then when I try to reload it from local memory I get the following error even though I have bitsandbytes (version 0.43.1) installed. Is there a workaround for this?: "You have a version of bitsandbytes that is not compatible with 4bit inference and training"

ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True and pass a custom device_map to from_pretrained. Check https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu for more details.

alarecha24 avatar Jul 03 '24 16:07 alarecha24

@alarecha24 That's a weird error msg - is this for Gemma? What's your GPU?

danielhanchen avatar Jul 04 '24 05:07 danielhanchen

I'm attempting to fine tune a locally saved model and running into the same issue. My GPU info is below: NVIDIA-SMI 535.161.08 Driver Version: 535.161.08 CUDA Version: 12.2 Tesla T4

nick-gt avatar Jul 11 '24 20:07 nick-gt

@nick-gt Are you using load_in_4bit = True?

danielhanchen avatar Jul 12 '24 06:07 danielhanchen

@danielhanchen yes, I've tried using both true and false.

nick-gt avatar Jul 12 '24 14:07 nick-gt

Hello everyone, I am running in the same issue :

"ValueError: Supplied state dict for model.layers.28.mlp.gate_proj.weight does not contain bitsandbytes__* and possibly other quantized_stats components."

I am trying to finetune the model "unsloth/codellama-13b-bnb-4bit" using the FastLanguageModel.from_pretrained() method, but it does not even seem to even pull the model since the error happens right after the shards are starting to be downloaded.

GPU : NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0 Torch : 2.3.0

Here is the code snippet :

max_seq_length = 2048 
dtype = None 
load_in_4bit = True 

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = model_name, 
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,  
    
)

Thanks in advance for the help!

jgarcia2809 avatar Jul 29 '24 07:07 jgarcia2809

@jgarcia2809 I just reuploaded Codellama-13b - hopefully it works now

danielhanchen avatar Jul 31 '24 03:07 danielhanchen

Hi @danielhanchen thank you for the quick response. Unfortunately it still gives me the same error.

I tried with the "unsloth/llama-3-8b-Instruct-bnb-4bit" and could finetune it using the same code. It did not produced any errors.

jgarcia2809 avatar Jul 31 '24 07:07 jgarcia2809

Ok that's very weird - I'll see what I can do - temporarily best to use unsloth/llama-3.1-8b - another approach is to uninstall unsloth then reinstall it

danielhanchen avatar Aug 02 '24 06:08 danielhanchen