LLaMA-Factory fsdp+qlora mixtral 8x22B: RuntimeError: Only Tensors of floating point and complex dtype can require gradients

Reminder

[X] I have read the README and searched the existing issues.

Reproduction

I'm using the latest llmtuner 0.7.0 with following libraries versions: transformers>=4.39.1 accelerate>=0.28.0 bitsandbytes>=0.43.0 and I get errors:

File "/opt/conda/envs/ptca/lib/python3.10/site-packages/llmtuner/model/loader.py", line 128, in load_model model = AutoModelForCausalLM.from_pretrained(**init_kwargs) File "/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained return model_class.from_pretrained( File "/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3677, in from_pretrained ) = cls._load_pretrained_model( File "/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4104, in _load_pretrained_model new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model( File "/opt/conda/envs/ptca/lib/python3.10/site-packages/transformers/modeling_utils.py", line 895, in _load_state_dict_into_meta_model value = type(value)(value.data.to("cpu"), **value.dict) File "/opt/conda/envs/ptca/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 568, in new obj = torch.Tensor._make_subclass(cls, data, requires_grad) RuntimeError: Only Tensors of floating point and complex dtype can require gradients

I have checked the issue: https://github.com/hiyouga/LLaMA-Factory/issues/3206 but doesn't help.

Expected behavior

No response

System Info

No response

Others

No response

Apr 28 '24 19:04 mces89

provide your training scripts

Apr 28 '24 23:04 hiyouga

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch
--config_file ../../accelerate/fsdp_config.yaml
../../../src/train_bash.py
--stage sft
--do_train
--do_eval
--model_name_or_path model_path
--dataset sample_dataset
--dataset_dir data
--template mistral
--finetuning_type lora
--lora_target all
--output_dir output_dir
--overwrite_cache
--overwrite_output_dir
--cutoff_len 16000
--preprocessing_num_workers 2
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 1
--lr_scheduler_type cosine
--logging_steps 1
--warmup_steps 1
--save_steps 100
--eval_steps 1
--evaluation_strategy steps
--learning_rate 5e-5
--num_train_epochs 2
--max_samples 1000
--val_size 0.2
--quantization_bit 8
--fp16
--upcast_layernorm \

Apr 29 '24 00:04 mces89

Tried mixtral 8x7B too and got the same error.

Apr 29 '24 00:04 mces89

fsdp+qlora only accepts 4bit quantization

Apr 29 '24 01:04 hiyouga

@hiyouga got it, is there any way to use multiple gpus in a single node to do 8bit qlora?

Apr 29 '24 01:04 mces89

8bit qlora only supports DDP

Apr 29 '24 03:04 hiyouga

I am using qlora with 4-bit quant but somehow i have the same error. For more detail, this is the config i used : BitsAndBytesConfig { "_load_in_4bit": true, "_load_in_8bit": false, "bnb_4bit_compute_dtype": "bfloat16", "bnb_4bit_quant_storage": "bfloat16", "bnb_4bit_quant_type": "nf4", "bnb_4bit_use_double_quant": true, "llm_int8_enable_fp32_cpu_offload": false, "llm_int8_has_fp16_weight": false, "llm_int8_skip_modules": null, "llm_int8_threshold": 6.0, "load_in_4bit": true, "load_in_8bit": false, "quant_method": "bitsandbytes" } And here is a snippet of the code responsible for the error model = AutoModelForCausalLM.from_pretrained( args.model_name_or_path, quantization_config=bnb_config, trust_remote_code=True, attn_implementation="flash_attention_2" if args.use_flash_attn else "eager", torch_dtype=torch_dtype, ) I am 100% sure it is the quant_config that gives me the same error as the first person. The solution of 4-bit didn't do anything in my case.

Jul 12 '24 15:07 NotTheStallion

LLaMA-Factory LLaMA-Factory copied to clipboard

fsdp+qlora mixtral 8x22B: RuntimeError: Only Tensors of floating point and complex dtype can require gradients

Reminder

Reproduction

Expected behavior

System Info

Others

LLaMA-Factory
LLaMA-Factory copied to clipboard