ChatGLM-6B [BUG/Help] 显存占用感觉比10b的大

[BUG/Help] 显存占用感觉比10b的大

Open piekey1994 opened this issue 1 year ago • 1 comments

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

在一张80g显存的卡上训练，之前训练10b的glm模型可以开到batchsize4，这个模型开到2就很容易爆显存。有用amp，不知道什么原因。直接用huggingface的代码没办法做模型并行，不知道有什么好的办法

Expected Behavior

No response

Steps To Reproduce

none

Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

Mar 20 '23 10:03 piekey1994

同，想用huggingface仓库代码微调，但是奈何模型太大，没法微调

Mar 22 '23 07:03 taofennanhai

同感觉，大家有找到原因吗

Mar 27 '23 13:03 wangdh1027

一个可能的原因是官方给的模型文件不适配 trainer的模型并行，具体怎么改可以参考https://github.com/yuanzhoulvpi2017/zero_nlp/blob/main/simple_thu_chatglm6b/thuglm/modeling_chatglm.py 我这里可以work 了

Mar 28 '23 04:03 wangdh1027

ChatGLM-6B ChatGLM-6B copied to clipboard

[BUG/Help] 显存占用感觉比10b的大

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Anything else?

ChatGLM-6B
ChatGLM-6B copied to clipboard