Qwen
Qwen copied to clipboard
[BUG] 奇怪的显存问题
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
- [X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
- [X] 我已经搜索过FAQ | I have searched FAQ
当前行为 | Current Behavior
无论是用1.8B还是7Blora微调的时候,显存总是占满,我的是V100,32G的显存 1.8B: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 22.00 MiB (GPU 7; 31.74 GiB total capacity; 3.22 GiB already allocated; 15.38 MiB free; 3.27 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 7B: CUDA out of memory. Tried to allocate 96.00 MiB (GPU 3; 31.74 GiB total capacity; 9.44 GiB already allocated; 11.38 MiB free; 9.46 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
期望行为 | Expected Behavior
No response
复现方法 | Steps To Reproduce
No response
运行环境 | Environment
- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):
备注 | Anything else?
No response
我在使用7B作qlora微调时,也遇到了相同的问题
两位可以提供下transformers和accelerate的版本吗~ 请问是否是在模型加载过程中就已经显存占用超出预期了,还是微调中出现的这个问题呢~
我也遇到一样的问题
Due to lack of information, the following is my best guesses:
- The original poster's issue appears to be unrelated to the finetuning process. The GPU memory seems to be taken by other processes.
- The second poster's could be different from the original, since QLoRA is adopted. The screenshot is unavailable now, we don't know what have happens.
- The last poster's issue is also different from the previous ones, since V100 16GB is used. 16GB cannot run Qwen-7B with LoRA.
For GPU memory usage, please take a look at the test here: https://github.com/QwenLM/Qwen/tree/main/recipes/finetune/deepspeed