Langchain-Chatchat [BUG] 内存溢出 / torch.cuda.OutOfMemoryError:

问题描述 / Problem Description 环境：autodl中镜像3090显卡内存溢出 torch.cuda.OutOfMemoryError: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.12 GiB (GPU 0; 23.70 GiB total capacity; 20.99 GiB already allocated; 602.56 MiB free; 22.12 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

复现问题的步骤 / Steps to Reproduce

在原来的镜像中clone最新代码
python webui.py
打开页面，加载11kb的知识库
询问问题，大约3个问题后，内存溢出

预期的结果 / Expected Result 预期应该可以正常回答

实际结果 / Actual Result 实际发生的结果 torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.12 GiB (GPU 0; 23.70 GiB total capacity; 20.99 GiB already allocated; 602.56 MiB free; 22.12 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

环境信息 / Environment Information

langchain-ChatGLM 版本/commit 号：0adcc64
是否使用 Docker 部署（是/否）：否
使用的模型（ChatGLM-6B / ClueAI/ChatYuan-large-v2 等）：ChatGLM-6B
使用的 Embedding 模型（GanymedeNil/text2vec-large-chinese 等）：GanymedeNil
操作系统及版本 / Operating system and version: linux ubuntu22.04
Python 版本 / Python version: 3.8
其他相关环境信息 / Other relevant environment information: 使用的是提供的autodl镜像

May 04 '23 08:05 ruolunhui

晚上我检查一下似乎这一版本很多人遇到显存溢出

ruolunhui @.***>于2023年5月4日周四16:37写道：

问题描述 / Problem Description 环境：autodl中镜像3090显卡内存溢出 torch.cuda.OutOfMemoryError: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.12 GiB (GPU 0; 23.70 GiB total capacity; 20.99 GiB already allocated; 602.56 MiB free; 22.12 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

复现问题的步骤 / Steps to Reproduce

在原来的镜像中clone最新代码

python webui.py

打开页面，加载11kb的知识库

询问问题，大约3个问题后，内存溢出

预期的结果 / Expected Result 预期应该可以正常回答

实际结果 / Actual Result 实际发生的结果 torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.12 GiB (GPU 0; 23.70 GiB total capacity; 20.99 GiB already allocated; 602.56 MiB free; 22.12 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

环境信息 / Environment Information

langchain-ChatGLM 版本/commit 号：0adcc64 https://github.com/imClumsyPanda/langchain-ChatGLM/commit/0adcc64dae60e798e8b895522ea03969698889dd

是否使用 Docker 部署（是/否）：否

使用的模型（ChatGLM-6B / ClueAI/ChatYuan-large-v2 等）：ChatGLM-6B

使用的 Embedding 模型（GanymedeNil/text2vec-large-chinese 等）：GanymedeNil

操作系统及版本 / Operating system and version: linux ubuntu22.04

Python 版本 / Python version: 3.8

其他相关环境信息 / Other relevant environment information: 使用的是提供的autodl镜像

— Reply to this email directly, view it on GitHub https://github.com/imClumsyPanda/langchain-ChatGLM/issues/229, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLH5ESZYRZATACRZHQQRCTXENTDFANCNFSM6AAAAAAXVOWVYM . You are receiving this because you are subscribed to this thread.Message ID: @.***>

May 04 '23 09:05 imClumsyPanda

好的谢谢

May 04 '23 10:05 ruolunhui

昨晚已更新代码，建议将chunk_size和top-k都适度调低

May 04 '23 23:05 imClumsyPanda

@imClumsyPanda 即便调低后，提交的文本较长，GPU 还是很容易溢出

May 05 '23 02:05 riskivy

我这里也遇到类似问题，感觉是max_length没有生效，我上传了一个英文pdf之后任何一个相关问题都会OOM。 Input length of input_ids is 49548, but max_length is set to 10000. This can lead to unexpected behavior. You should consider increasing max_new_tokens.

May 05 '23 06:05 aindy-niu

max_length应该生效了，应该是分句问题，单句太长了导致的

aindy-niu @.***>于2023年5月5日周五14:26写道：

我这里也遇到类似问题，感觉是max_length没有生效，我上传了一个英文pdf之后任何一个相关问题都会OOM。 Input length of input_ids is 49548, but max_length is set to 10000. This can lead to unexpected behavior. You should consider increasing max_new_tokens.

— Reply to this email directly, view it on GitHub https://github.com/imClumsyPanda/langchain-ChatGLM/issues/229#issuecomment-1535772310, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLH5ER3Y4DNHUUFBPDC43DXESMPVANCNFSM6AAAAAAXVOWVYM . You are receiving this because you were mentioned.Message ID: @.***>

May 05 '23 06:05 imClumsyPanda

看了下代码感觉这个问题是不是无解了，无论提什么样的问题，如果自己的文档很大而且匹配到的又很多的话，这个prompt就会巨大

PROMPT_TEMPLATE = """已知信息：
{context} 

根据上述已知信息，简洁和专业的来回答用户的问题。如果无法从中得到答案，请说 “根据已知信息无法回答该问题” 或 “没有提供足够的相关信息”，不允许在答案中添加编造成分，答案请使用中文。 问题是：{question}"""

def generate_prompt(related_docs: List[str],
                    query: str,
                    prompt_template=PROMPT_TEMPLATE) -> str:
    context = "\n".join([doc.page_content for doc in related_docs])
    prompt = prompt_template.replace("{question}", query).replace("{context}", context)
    return prompt

vector_store = FAISS.load_local(vs_path, self.embeddings)
FAISS.similarity_search_with_score_by_vector = similarity_search_with_score_by_vector
vector_store.chunk_size = self.chunk_size
related_docs_with_score = vector_store.similarity_search_with_score(query,
                                                                    k=self.top_k)
related_docs = get_docs_with_score(related_docs_with_score)
torch_gc()
prompt = generate_prompt(related_docs, query)

May 06 '23 03:05 shubihu

@shubihu 目前测试下来，只能控制 token 数和加显存或者使用CPU，这个是ChatGLM模型很占这么多资源。

May 06 '23 05:05 riskivy

同样遇到这个问题： LLM chatGLM ，Embedding 模型 GanymedeNil，我观察了一下 GPU 的使用，发现每次问一个问题， GPU 的显存占用只增加不释放，我是 40G 的显存，通常问5-6个问题后， GPU 就爆了。内存的释放有问题。

May 10 '23 12:05 wtxidian

同感内存释放有问题，16GB显存GPU问3个问题就OOM了，无论问题间间隔多少时间，内存一直都是有增无减

May 11 '23 01:05 Donnydong

@wtxidian @Donnydong 那说明使用有问题的，显存的消耗最终是跟Token长度有关系，释放是肯定能释放的，如果没释放说明代码和使用有问题。我最终的业务在控制 Token 长度的情况下，16G显存也能稳定运行 ChatGLM-6B FP16 模型，已持续运行一周以上，未出现OOM。框架中是通过 torch_gc() 函数释放显存的。最后建议是不要直接复用代码你达到你的场景，是要去了解他的机制，再基于你们的场景调整代码，来达到效果。

May 11 '23 07:05 riskivy

问题描述 / Problem Description 环境：autodl中镜像3090显卡内存溢出 torch.cuda.OutOfMemoryError: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.12 GiB (GPU 0; 23.70 GiB total capacity; 20.99 GiB already allocated; 602.56 MiB free; 22.12 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

复现问题的步骤 / Steps to Reproduce

在原来的镜像中clone最新代码

python webui.py

打开页面，加载11kb的知识库

询问问题，大约3个问题后，内存溢出

预期的结果 / Expected Result 预期应该可以正常回答

实际结果 / Actual Result 实际发生的结果 torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.12 GiB (GPU 0; 23.70 GiB total capacity; 20.99 GiB already allocated; 602.56 MiB free; 22.12 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

环境信息 / Environment Information

langchain-ChatGLM 版本/commit 号：0adcc64

是否使用 Docker 部署（是/否）：否

使用的模型（ChatGLM-6B / ClueAI/ChatYuan-large-v2 等）：ChatGLM-6B

使用的 Embedding 模型（GanymedeNil/text2vec-large-chinese 等）：GanymedeNil

操作系统及版本 / Operating system and version: linux ubuntu22.04

Python 版本 / Python version: 3.8

其他相关环境信息 / Other relevant environment information: 使用的是提供的autodl镜像

请问兄弟解决这个问题了吗？我现在也有这个问题

May 31 '23 04:05 popfat

同遇到这个问题，20多G显存，就问了一个问题，第二个问题就出现内存不够的提示，然后报错。langchain-chatglm这到底能不能用啊？

Jul 05 '23 10:07 finyone

请更新到最新版本，已经解决这个问题

Sep 27 '23 13:09 zRzRzRzRzRzRzR

Langchain-Chatchat Langchain-Chatchat copied to clipboard

[BUG] 内存溢出 / torch.cuda.OutOfMemoryError:

langchain-ChatGLM 版本/commit 号：0adcc64 https://github.com/imClumsyPanda/langchain-ChatGLM/commit/0adcc64dae60e798e8b895522ea03969698889dd

是否使用 Docker 部署（是/否）：否

使用的模型（ChatGLM-6B / ClueAI/ChatYuan-large-v2 等）：ChatGLM-6B

使用的 Embedding 模型（GanymedeNil/text2vec-large-chinese 等）：GanymedeNil

操作系统及版本 / Operating system and version: linux ubuntu22.04

Python 版本 / Python version: 3.8

Langchain-Chatchat
Langchain-Chatchat copied to clipboard