ChatGLM-6B [BUG/Help] 随着对话轮数增加，有什么方法可以防止/减少ChatGLM的显存增加吗？

[BUG/Help] 随着对话轮数增加，有什么方法可以防止/减少ChatGLM的显存增加吗？

Open jasonliu119 opened this issue 1 year ago • 6 comments

“进行 2 至 3 轮对话后，8-bit 量化下 GPU 显存占用约为 10GB，4-bit 量化下仅需 6GB 占用。随着对话轮数的增多，对应消耗显存也随之增长”

我使用ChatGLM实现一个角色扮演的应用，也遇到了类似问题。每次对话后，ChatGLM显存使用增加几百到1、2G。

感谢！

No response

正常和ChatGLM对话即可

- OS: Ubuntu 20.04
- Python: 3.10
- Transformers: 4.26.1
- PyTorch: 1.12
- CUDA Support: True

No response

Jun 02 '23 10:06 jasonliu119

你好我也是这个问题请问解决没有

Jun 06 '23 08:06 superyoman

要是并发有10个人同时提问，此不是每次ChatGLM显存使用增加到10、20G。

Jun 07 '23 03:06 qq516249940

限制历史的对话输入就行。

Jun 07 '23 06:06 finlay-liu

限制历史的对话输入就行。

问题是我没有设置history，glm是默认增加历史对话记录吗，那么如何只限制单轮对话，不储存对话历史

Jun 07 '23 15:06 superyoman

限制历史的对话输入就行。

问题是我没有设置history，glm是默认增加历史对话记录吗，那么如何只限制单轮对话，不储存对话历史

你去看底层的api，有history的使用。

Jun 08 '23 09:06 finlay-liu

chatglm_llm.py def generatorAnswer(self, prompt: str, history: List[List[str]] = [], streaming: bool = False): history = [] ##强制制空列表 if streaming:

Jun 24 '23 14:06 shleo

mark

Jul 18 '23 09:07 aleimu

api.py中有torch_gc

Aug 30 '23 09:08 umbraclet16