ChatGLM3
ChatGLM3 copied to clipboard
模型无法加载
System Info / 系統信息
Cuda: 117 Transformer==4.41.2 Python: 3.9
显卡:3090
Who can help? / 谁可以帮助到您?
No response
Information / 问题信息
- [ ] The official example scripts / 官方的示例脚本
- [X] My own modified scripts / 我自己修改的脚本和任务
Reproduction / 复现过程
模型设置:
q_config = BitsAndBytesConfig(load_in_4bit=True,
bnb_4bit_quant_type='nf4',
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.float32)
base_model = AutoModel.from_pretrained(MODEL_PATH,
quantization_config=q_config,
trust_remote_code=True,
device_map='auto')
加载过程:
bin /opt/conda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths... CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 8.6 CUDA SETUP: Detected CUDA version 117 CUDA SETUP: Loading binary /opt/conda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so... Loading checkpoint shards: 100%|██████████| 7/7 [00:12<00:00, 1.83s/it] You are calling
save_pretrainedto a 4-bit converted model, but yourbitsandbytesversion doesn't support it. If you want to save 4-bit models, make sure to havebitsandbytes>=0.41.3installed. /opt/conda/lib/python3.10/site-packages/peft/peft_model.py:556: FutureWarning: You are usingtorch.loadwithweights_only=False(the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value forweights_onlywill be flipped toTrue. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user viatorch.serialization.add_safe_globals. We recommend you start settingweights_only=Truefor any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. adapters_weights = torch.load(
之后就卡在这里不动了
bitsandbytes==0.41.3时:
Loading checkpoint shards: 100%|██████████| 7/7 [00:16<00:00, 2.34s/it] /opt/conda/envs/mee/lib/python3.9/site-packages/peft/peft_model.py:556: FutureWarning: You are using
torch.loadwithweights_only=False(the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value forweights_onlywill be flipped toTrue. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user viatorch.serialization.add_safe_globals. We recommend you start settingweights_only=Truefor any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. adapters_weights = torch.load( /opt/conda/envs/mee/lib/python3.9/site-packages/bitsandbytes/nn/modules.py:228: UserWarning: Input type into Linear4bit is torch.float16, but bnb_4bit_compute_type=torch.float32 (default). This will lead to slow inference or training speed. warnings.warn(f'Input type into Linear4bit is torch.float16, but bnb_4bit_compute_type=torch.float32 (default). This will lead to slow inference or training speed.')
同样卡在这里不动了 两种情况GPU负载相同,且进程均在运行
Expected behavior / 期待表现
模型可以正常加载
#1263 看起来是类似的问题,但是我卡在加载base_model,不涉及peft_model;且在一个月前代码尚能正常加载运行
bitsandbytes>=0.41.3