ChatGLM-Tuning
ChatGLM-Tuning copied to clipboard

Published 20 hours ago •

Reame
Issues

ChatGLM LoRA微调之后，量化quantize=8显存、推理耗时都反向增加

Open moon4869 opened this issue 1 year ago • 1 comments

不使用量化的推理显存占用14GB，使用量化8之后显存占用20GB，量化4则占用17GB，请问是什么原因导致？显卡是A100 80G

Jul 03 '23 11:07 moon4869

torch.cuda.empty_cache试试

Jul 06 '23 06:07 horizon86