ChatGLM-6B icon indicating copy to clipboard operation
ChatGLM-6B copied to clipboard

[Help] 是否可将4bit量化之后的模型进行保存,以便直接加载

Open digits122 opened this issue 1 year ago • 1 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Current Behavior

当前模型加载过程消耗很多内存,多达14g,经过4bit量化过后,实际运行时占用的显存空间很少。所以,是否能将量化后的模型直接保存到磁盘,这样可以直接进行加载,并且在小内存机器(8g内存)情况下,能够不需要虚拟内存也能够直接加载模型?如果可以,应该如何保存量化后的模型,以及如何加载量化后的模型

Expected Behavior

No response

Steps To Reproduce

Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

digits122 avatar Mar 16 '23 21:03 digits122