[BUG] <多卡推理demo报错>
Is there an existing issue for this?
- [X] I have searched the existing issues
Current Behavior
│ /home/shy/.cache/huggingface/modules/transformers_modules/ChatGLM-6B/modeling_chatglm.py:202 in │ │ forward │ │ │ │ 199 │ │ │ seq_len = x.shape[seq_dim] │ │ 200 │ │ if self.max_seq_len_cached is None or (seq_len > self.max_seq_len_cached): │ │ 201 │ │ │ self.max_seq_len_cached = None if self.learnable else seq_len │ │ ❱ 202 │ │ │ t = torch.arange(seq_len, device=x.device, dtype=self.inv_freq.dtype) │ │ 203 │ │ │ freqs = torch.einsum('i,j->ij', t, self.inv_freq) │ │ 204 │ │ │ # Different from paper, but it uses a different permutation in order to obta │ │ 205 │ │ │ emb = torch.cat((freqs, freqs), dim=-1).to(x.device) │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ OutOfMemoryError: CUDA out of memory. Tried to allocate more than 1EB memory.
Expected Behavior
No response
Steps To Reproduce
- cli_demo.py中
# model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
from utils import load_model_on_gpus
model = load_model_on_gpus("THUDM/chatglm-6b", num_gpus=4)
其它no change
Environment
- OS:Ubuntu 18.04
- Python:3.10.9
- Transformers:4.27.1
- PyTorch:2.0.0
- CUDA Support:True
- cuda-runtime:11.7.1
- device:Tesla K80*4
Anything else?
No response
同问,楼主是怎么解决的
我也遇到了同样的问题,4张4090多卡推理时报同样的错,将num_gpus=1后不会报错。报错提示分配1EB显存,很离谱。有解决方案吗?
同样的问题,请问如何解决?
这个问题随机出现,目前没有找到办法