加载模型时报错,提示信息:Only Tensors of floating point and complex dtype can require gradients
Is there an existing issue for this?
- [X] I have searched the existing issues
Current Behavior
单机微调int4的chatglm模型,在模型加载时出现错误,提示信息:Only Tensors of floating point and complex dtype can require gradients
Expected Behavior
No response
Steps To Reproduce
错误信息及相关参数如下: model_to_load_type: <class 'str'> /work/models/chatglm-6b-int4 ChatGLMConfig { "_name_or_path": "/work/models/chatglm-6b-int4", "architectures": [ "ChatGLMModel" ], "auto_map": { "AutoConfig": "configuration_chatglm.ChatGLMConfig", "AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration", "AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration" }, "bos_token_id": 130004, "eos_token_id": 130005, "gmask_token_id": 130001, "hidden_size": 4096, "inner_hidden_size": 16384, "layernorm_epsilon": 1e-05, "mask_token_id": 130000, "max_sequence_length": 2048, "model_type": "chatglm", "num_attention_heads": 32, "num_layers": 28, "pad_token_id": 3, "position_encoding_2d": true, "pre_seq_len": null, "prefix_projection": false, "quantization_bit": 4, "quantization_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.33.2", "use_cache": true, "vocab_size": 130528 }
config_kwargs: {'trust_remote_code': True, 'cache_dir': None, 'revision': 'main', 'use_auth_token': None, 'load_in_4bit': True, 'quantization_config': BitsAndBytesConfig {
"bnb_4bit_compute_dtype": "float32",
"bnb_4bit_quant_type": "nf4",
"bnb_4bit_use_double_quant": true,
"llm_int8_enable_fp32_cpu_offload": false,
"llm_int8_has_fp16_weight": false,
"llm_int8_skip_modules": null,
"llm_int8_threshold": 6.0,
"load_in_4bit": true,
"load_in_8bit": false,
"quant_method": "bitsandbytes"
}
, 'device_map': {'': 0}}
[INFO|modeling_utils.py:2502] 2023-09-19 01:16:09,589 >> Overriding torch_dtype=None with torch_dtype=torch.float16 due to requirements of bitsandbytes to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning.
[INFO|modeling_utils.py:2866] 2023-09-19 01:16:09,589 >> loading weights file /work/models/chatglm-6b-int4/pytorch_model.bin
[INFO|modeling_utils.py:1200] 2023-09-19 01:16:10,752 >> Instantiating ChatGLMForConditionalGeneration model under default dtype torch.float16.
[INFO|configuration_utils.py:768] 2023-09-19 01:16:10,875 >> Generate config GenerationConfig {
"_from_model_config": true,
"bos_token_id": 130004,
"eos_token_id": 130005,
"pad_token_id": 3,
"transformers_version": "4.33.2"
}
报错信息:
No compiled kernel found.
Compiling kernels : /root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization_kernels_parallel.c
Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 /root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization_kernels_parallel.c -shared -o /root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization_kernels_parallel.so
Load kernel : /root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization_kernels_parallel.so
Setting CPU quantization kernel threads to 8
Using quantization cache
Applying quantization to glm layers
Traceback (most recent call last):
File "/work/src/train_bash.py", line 21, in
Environment
- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :
Anything else?
No response
我也遇到同样的问题,请问现在有解决方案吗
同问
同问