ChatGLM2-6B 加载模型时报错，提示信息：Only Tensors of floating point and complex dtype can require gradients

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

单机微调int4的chatglm模型，在模型加载时出现错误，提示信息：Only Tensors of floating point and complex dtype can require gradients

Expected Behavior

No response

Steps To Reproduce

错误信息及相关参数如下： model_to_load_type: <class 'str'> /work/models/chatglm-6b-int4 ChatGLMConfig { "_name_or_path": "/work/models/chatglm-6b-int4", "architectures": [ "ChatGLMModel" ], "auto_map": { "AutoConfig": "configuration_chatglm.ChatGLMConfig", "AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration", "AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration" }, "bos_token_id": 130004, "eos_token_id": 130005, "gmask_token_id": 130001, "hidden_size": 4096, "inner_hidden_size": 16384, "layernorm_epsilon": 1e-05, "mask_token_id": 130000, "max_sequence_length": 2048, "model_type": "chatglm", "num_attention_heads": 32, "num_layers": 28, "pad_token_id": 3, "position_encoding_2d": true, "pre_seq_len": null, "prefix_projection": false, "quantization_bit": 4, "quantization_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.33.2", "use_cache": true, "vocab_size": 130528 }

config_kwargs: {'trust_remote_code': True, 'cache_dir': None, 'revision': 'main', 'use_auth_token': None, 'load_in_4bit': True, 'quantization_config': BitsAndBytesConfig { "bnb_4bit_compute_dtype": "float32", "bnb_4bit_quant_type": "nf4", "bnb_4bit_use_double_quant": true, "llm_int8_enable_fp32_cpu_offload": false, "llm_int8_has_fp16_weight": false, "llm_int8_skip_modules": null, "llm_int8_threshold": 6.0, "load_in_4bit": true, "load_in_8bit": false, "quant_method": "bitsandbytes" } , 'device_map': {'': 0}} [INFO|modeling_utils.py:2502] 2023-09-19 01:16:09,589 >> Overriding torch_dtype=None with torch_dtype=torch.float16 due to requirements of bitsandbytes to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning. [INFO|modeling_utils.py:2866] 2023-09-19 01:16:09,589 >> loading weights file /work/models/chatglm-6b-int4/pytorch_model.bin [INFO|modeling_utils.py:1200] 2023-09-19 01:16:10,752 >> Instantiating ChatGLMForConditionalGeneration model under default dtype torch.float16. [INFO|configuration_utils.py:768] 2023-09-19 01:16:10,875 >> Generate config GenerationConfig { "_from_model_config": true, "bos_token_id": 130004, "eos_token_id": 130005, "pad_token_id": 3, "transformers_version": "4.33.2" } 报错信息： No compiled kernel found. Compiling kernels : /root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization_kernels_parallel.c Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 /root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization_kernels_parallel.c -shared -o /root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization_kernels_parallel.so Load kernel : /root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization_kernels_parallel.so Setting CPU quantization kernel threads to 8 Using quantization cache Applying quantization to glm layers Traceback (most recent call last): File "/work/src/train_bash.py", line 21, in main() File "/work/src/train_bash.py", line 8, in main run_sft(model_args, data_args, training_args, finetuning_args) File "/work/src/glmtuner/tuner/sft/workflow.py", line 24, in run_sft model, tokenizer = load_model_and_tokenizer(model_args, finetuning_args, training_args.do_train, stage="sft") File "/work/src/glmtuner/tuner/core/loader.py", line 114, in load_model_and_tokenizer model = AutoModel.from_pretrained(model_to_load, config=config, **config_kwargs) File "/usr/local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 558, in from_pretrained return model_class.from_pretrained( File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2966, in from_pretrained model = cls(config, *model_args, **model_kwargs) File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/modeling_chatglm.py", line 1061, in init self.quantize(self.config.quantization_bit, self.config.quantization_embeddings, use_quantization_cache=True, empty_init=True) File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/modeling_chatglm.py", line 1439, in quantize self.transformer = quantize(self.transformer, bits, use_quantization_cache=use_quantization_cache, empty_init=empty_init, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization.py", line 496, in quantize layer.attention.query_key_value = QuantizedLinearWithPara( File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization.py", line 369, in init self.weight = Parameter(self.weight.to(kwargs["device"]), requires_grad=False) File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1632, in setattr self.register_parameter(name, value) File "/usr/local/lib/python3.10/site-packages/accelerate/big_modeling.py", line 123, in register_empty_parameter module._parameters[name] = param_cls(module._parameters[name].to(device), **kwargs) File "/usr/local/lib/python3.10/site-packages/torch/nn/parameter.py", line 36, in new return torch.Tensor._make_subclass(cls, data, requires_grad) RuntimeError: Only Tensors of floating point and complex dtype can require gradients

Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

Sep 19 '23 02:09 Arkon2021

我也遇到同样的问题，请问现在有解决方案吗

Sep 26 '23 03:09 Chinoholo0807

同问

Oct 25 '23 14:10 zizhec

同问

Apr 10 '24 06:04 Julylmm