Langchain-Chatchat icon indicating copy to clipboard operation
Langchain-Chatchat copied to clipboard

[BUG] CUDA out of memory with container deployment

Open BillShiyaoZhang opened this issue 2 years ago • 3 comments

问题描述 / Problem Description 按照 README 下载 image 运行后,完成启动所需安装,报错 CUDA out of memory.

复现问题的步骤 / Steps to Reproduce

  1. 执行 'docker run --gpus all -d --name chatglm -p 7860:7860 chatglm-cuda:latest'
  2. 问题出现 / Problem occurs

预期的结果 / Expected Result 完成启动,打开网页进入

实际结果 / Actual Result

  1. ERROR 2023-05-10 18:15:04,235-1d: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 8.00 GiB total capacity; 7.25 GiB already allocated; 0 bytes free; 7.25 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
  2. INFO 2023-05-10 18:15:04,236-1d: 模型未成功加载,请到页面左上角"模型配置"选项卡中重新选择后点击"加载模型"按钮

环境信息 / Environment Information

  • langchain-ChatGLM 版本/commit 号:v0.1.11
  • 是否使用 Docker 部署(是/否):是
  • 使用的模型(ChatGLM-6B / ClueAI/ChatYuan-large-v2 等):默认模型
  • 使用的 Embedding 模型(GanymedeNil/text2vec-large-chinese 等):默认模型
  • 操作系统及版本 / Operating system and version: Ubuntu 22.04 with WSL in Windows 11

附加信息 / Additional Information 添加与问题相关的任何其他信息 / Add any other information related to the issue.

BillShiyaoZhang avatar May 10 '23 18:05 BillShiyaoZhang

模型未成功加载也能爆cuda显存错误啊,有点怪

KyonCN avatar May 11 '23 01:05 KyonCN

应该是因为显存不满足默认模型要求,建议改成量化模型试试看

imClumsyPanda avatar May 11 '23 03:05 imClumsyPanda

应该是因为显存不满足默认模型要求,建议改成量化模型试试看

8G的显存跑4bit 是不是也不够?然后在哪里可以设置量化模型啊?

0xYx avatar May 22 '23 12:05 0xYx