[BUG/Help] 微调之后模型部署,设置了多卡,但还是只用第0卡,显示内存不足
Is there an existing issue for this?
- [X] I have searched the existing issues
Current Behavior
web_demo.shPRE_SEQ_LEN=128
CUDA_VISIBLE_DEVICES=5,6,7 python3 web_demo.py
--model_name_or_path /root/ChatGLM2-6B/chatglm2-6b
--ptuning_checkpoint output/adgen-chatglm2-6b-pt-128-2e-2/checkpoint-3000
--pre_seq_len $PRE_SEQ_LEN
`
GPU 多卡情况
Expected Behavior
No response
Steps To Reproduce
报错:`torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 MiB (GPU 0; 10.92 GiB total capacity; 10.44 GiB already allocated; 19.25 MiB free; 10.46 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Environment
- OS:
- Python:3.8
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :
Anything else?
No response
怎么才能在微调后,模型部署运行web_demo.sh时,调用多卡成功?
请问你解决这个问题了吗?