ChatGLM3 icon indicating copy to clipboard operation
ChatGLM3 copied to clipboard

模型无法加载

Open daihuaiii opened this issue 1 year ago • 1 comments

System Info / 系統信息

Cuda: 117 Transformer==4.41.2 Python: 3.9

显卡:3090

Who can help? / 谁可以帮助到您?

No response

Information / 问题信息

  • [ ] The official example scripts / 官方的示例脚本
  • [X] My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

模型设置:

q_config = BitsAndBytesConfig(load_in_4bit=True,
                                  bnb_4bit_quant_type='nf4',
                                  bnb_4bit_use_double_quant=True,
                                  bnb_4bit_compute_dtype=torch.float32)
base_model = AutoModel.from_pretrained(MODEL_PATH,
                                           quantization_config=q_config,
                                           trust_remote_code=True,
                                           device_map='auto')                

加载过程:

bin /opt/conda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths... CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 8.6 CUDA SETUP: Detected CUDA version 117 CUDA SETUP: Loading binary /opt/conda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so... Loading checkpoint shards: 100%|██████████| 7/7 [00:12<00:00, 1.83s/it] You are calling save_pretrained to a 4-bit converted model, but your bitsandbytes version doesn't support it. If you want to save 4-bit models, make sure to have bitsandbytes>=0.41.3 installed. /opt/conda/lib/python3.10/site-packages/peft/peft_model.py:556: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. adapters_weights = torch.load(

之后就卡在这里不动了

bitsandbytes==0.41.3时:

Loading checkpoint shards: 100%|██████████| 7/7 [00:16<00:00, 2.34s/it] /opt/conda/envs/mee/lib/python3.9/site-packages/peft/peft_model.py:556: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. adapters_weights = torch.load( /opt/conda/envs/mee/lib/python3.9/site-packages/bitsandbytes/nn/modules.py:228: UserWarning: Input type into Linear4bit is torch.float16, but bnb_4bit_compute_type=torch.float32 (default). This will lead to slow inference or training speed. warnings.warn(f'Input type into Linear4bit is torch.float16, but bnb_4bit_compute_type=torch.float32 (default). This will lead to slow inference or training speed.')

同样卡在这里不动了 两种情况GPU负载相同,且进程均在运行

Expected behavior / 期待表现

模型可以正常加载

daihuaiii avatar Jul 29 '24 11:07 daihuaiii

#1263 看起来是类似的问题,但是我卡在加载base_model,不涉及peft_model;且在一个月前代码尚能正常加载运行

daihuaiii avatar Jul 29 '24 11:07 daihuaiii

bitsandbytes>=0.41.3

zRzRzRzRzRzRzR avatar Sep 04 '24 15:09 zRzRzRzRzRzRzR