MOSS icon indicating copy to clipboard operation
MOSS copied to clipboard

RuntimeError: CUDA out of memory

Open ImGoodBai opened this issue 1 year ago • 5 comments

安装完成第一次运行时报错,ubuntu2204/nvidia T4卡x2 。是要切换量化等级吗?

$python moss_cli_demo.py Fetching 17 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17/17 [00:00<00:00, 212211.81it/s] Waiting for all devices to be ready, it may take a few minutes... Traceback (most recent call last): File "moss_cli_demo.py", line 31, in model = load_checkpoint_and_dispatch( File "/home/good/anaconda3/envs/moss/lib/python3.8/site-packages/accelerate/big_modeling.py", line 479, in load_checkpoint_and_dispatch load_checkpoint_in_model( File "/home/good/anaconda3/envs/moss/lib/python3.8/site-packages/accelerate/utils/modeling.py", line 946, in load_checkpoint_in_model set_module_tensor_to_device(model, param_name, param_device, value=param, dtype=dtype) File "/home/good/anaconda3/envs/moss/lib/python3.8/site-packages/accelerate/utils/modeling.py", line 149, in set_module_tensor_to_device new_value = value.to(device) RuntimeError: CUDA out of memory. Tried to allocate 288.00 MiB (GPU 1; 14.62 GiB total capacity; 13.28 GiB already allocated; 243.38 MiB free; 13.30 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

ImGoodBai avatar Apr 25 '23 03:04 ImGoodBai

hello

ImGoodBai avatar Apr 25 '23 05:04 ImGoodBai

官方提示如下,实际上我32G显存是跑不起来FP16的,所以需要在启动文件中修改模型文件。

量化等级 加载模型 完成一轮对话(估计值) 达到最大对话长度2048
FP16 31GB 42GB 81GB
Int8 16GB 24GB 46GB
Int4 7.8GB 12GB 26GB

具体参考:

moss-moon-003-sft-int4: 4bit量化版本的moss-moon-003-sft模型,约占用12GB显存即可进行推理。 moss-moon-003-sft-int8: 8bit量化版本的moss-moon-003-sft模型,约占用24GB显存即可进行推理。

ImGoodBai avatar Apr 25 '23 07:04 ImGoodBai

没有人回答,自己来。

ImGoodBai avatar Apr 25 '23 07:04 ImGoodBai

我moss-moon-003-sft-int4,gpu 32GB 也是报gpu内存不足,和你的一样

HDRBgg avatar Apr 25 '23 08:04 HDRBgg

+1

linbojin avatar Apr 25 '23 13:04 linbojin

我moss-moon-003-sft-int4,gpu 32GB 也是报gpu内存不足,和你的一样

我换了int4OK了,出现了另一个语法错误。out of mem错误没有了。 但int8不行,out of mem错误依然存在。

ImGoodBai avatar Apr 26 '23 03:04 ImGoodBai

请问训练需要多少显存呀

starplatinum3 avatar Apr 26 '23 08:04 starplatinum3

请问你用的cuda版本是多少呀

Cocoalate avatar Aug 14 '23 07:08 Cocoalate