ChatGLM-6B icon indicating copy to clipboard operation
ChatGLM-6B copied to clipboard

更新了模型后,报这样的错。

Open cywjava opened this issue 2 years ago • 2 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Current Behavior

RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript (most recent call last): RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript (most recent call last): File "/home/train/.cache/huggingface/modules/transformers_modules/modeling_chatglm.py", line 169, in def gelu_impl(x): """OpenAI's gelu implementation.""" return 0.5 * x * (1.0 + torch.tanh(0.7978845608028654 * x * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ (1.0 + 0.044715 * x * x))) ~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE RuntimeError: CUDA out of memory. Tried to allocate 34.00 MiB (GPU 0; 23.70 GiB total capacity; 21.79 GiB already allocated; 12.56 MiB free; 22.23 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Expected Behavior

这是新模型的 bug吗?

Steps To Reproduce

这是新模型的 bug吗?如何解决呢

Environment

- OS:centos 7.9
- Python: 3.7.16
- Transformers: 4.27.1
- PyTorch: 1.13
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

cywjava avatar Apr 23 '23 06:04 cywjava

这是显存爆了,跟模型没关系吧

duzx16 avatar Apr 23 '23 07:04 duzx16

我也遇到类似问题了,chatglm-6b一直起不来,24G显存才用不到10G,就报OOM了。环境变量配置了:PYTORCH_CUDA_ALLOC_CONF="max_split_size_mb:24" 也不顶用。。。

  • OS: Debian11(WSL2)
  • Python: 3.10.10
  • Transformers: 4.27.1
  • PyTorch:
  • CUDA Support: True 8fb9ebcaf3e286f06edf9f30f6d7a83

lcjqyml avatar Apr 25 '23 07:04 lcjqyml