ChatGLM2-6B [BUG/Help] 微调之后模型部署，设置了多卡，但还是只用第0卡，显示内存不足

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

web_demo.shPRE_SEQ_LEN=128

CUDA_VISIBLE_DEVICES=5,6,7 python3 web_demo.py
--model_name_or_path /root/ChatGLM2-6B/chatglm2-6b
--ptuning_checkpoint output/adgen-chatglm2-6b-pt-128-2e-2/checkpoint-3000
--pre_seq_len $PRE_SEQ_LEN ` GPU 多卡情况

Expected Behavior

No response

Steps To Reproduce

报错：`torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 MiB (GPU 0; 10.92 GiB total capacity; 10.44 GiB already allocated; 19.25 MiB free; 10.46 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Environment

- OS:
- Python:3.8
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

Aug 24 '23 07:08 SebastianHan

怎么才能在微调后，模型部署运行web_demo.sh时，调用多卡成功？

Aug 24 '23 07:08 SebastianHan

请问你解决这个问题了吗？

Mar 14 '24 14:03 2811668688