Baichuan-7B
Baichuan-7B copied to clipboard
[BUG] CUDA Out of Memory when eval model.
Required prerequisites
- [X] I have read the documentation https://github.com/baichuan-inc/baichuan-7B/blob/HEAD/README.md.
- [X] I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
- [X] Consider asking first in a Discussion.
System information
conda environment torch=2.0.1 transformers=4.29.2 ...
Problem description
I used A100(80G) to run the evaluate_zh.py script for evaluating baichuan model, but it occupied abundant GPU memory up to overflow. Then I found the model loaded without eval mode, meanwhile, it inferred without no_grad.
Reproducible example code
The Python snippets:
[https://github.com/baichuan-inc/Baichuan-7B/blob/6f3ef4633a90c2d8a3e0763d0dec1b8dc11588f5/evaluation/evaluate_zh.py#L97C13-L97C13](url)
self.model = model.eval()
https://github.com/baichuan-inc/Baichuan-7B/blob/6f3ef4633a90c2d8a3e0763d0dec1b8dc11588f5/evaluation/evaluate_zh.py#L103
Add on this line:
@torch.inference_mode()
Command lines:
Extra dependencies:
Steps to reproduce:
Traceback
No response
Expected behavior
No response
Additional context
No response
Checklist
- [X] I have provided all relevant and necessary information above.
- [X] I have chosen a suitable title for this issue.
Thank you. It works!!!
Thanks!
您的邮件已经收到,会尽快回复您
我在训练模型过程中,脚本默认使用gpu0,怎么调换到gpu1上面?
您的邮件已经收到,会尽快回复您