transformers_tasks icon indicating copy to clipboard operation
transformers_tasks copied to clipboard

关于GLM finetune的OOM

Open nuoma opened this issue 2 years ago • 0 comments

使用train_multi_gpu, 两张3090显存报OOM。一开始是加载就OOM,把命令行中的FP16去掉后能够训练,但是不久就OOM,显存占用几乎是顶格23.4G/24G。然后我把加载模型的时候去掉了.half()加上了load_in_8bit=True,报错:ValueError: You can't train a model that has been loaded in 8-bit precision on multiple devices. 看了是accelerator不支持的问题。

nuoma avatar Apr 24 '23 08:04 nuoma