ChatGLM-6B 怎么释放GPU内存，使用torch.cuda.empty

Is your feature request related to a problem? Please describe.

随着提问的问题的增多，GPU内存占用也会增加，内存会溢出，有没有办法，每次推理完成后，释放下内存，我使用torch.cuda.empty_cache()不起作用。

Solutions

怎么修改代码，可以在推理内存满了后，释放下内存，继续推理

Additional context

No response

May 29 '23 08:05 guiniao

怎么会不起作用，可能是你修改的位置错了。在返回answer之前的那一行修改

May 30 '23 01:05 dizhenx

要同时清空history才行吧，要不下次推理会把历史对话带上，依然OOM

Jun 01 '23 01:06 ixjx

我也遇到了内存溢出的问题 OutOfMemoryError: CUDA out of memory. Tried to allocate 646.00 MiB (GPU 0; 14.76 GiB total capacity; 12.35 GiB already allocated; 529.75 MiB free; 13.41 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Jun 01 '23 08:06 sujunze

@sujunze @ixjx @dizhenx ，是的，我同时清空了history和torch.cuda.empty_cache()，起作用了，torch.cuda.empty_cache()这个之前没起作用是位置不对

Jun 06 '23 03:06 guiniao

清除 torch.cuda.empty_cache() 和history也不起作用

Jun 07 '23 08:06 controZheng

@controZheng , 你是不是清理的卡和位置不对，看下你用的是哪张卡，清理对应的卡，在predict函数后，return之前释放下。 def torch_gc(): if torch.cuda.is_available(): with torch.cuda.device('cuda:1'): torch.cuda.empty_cache() torch.cuda.ipc_collect()

Jun 07 '23 08:06 guiniao

@controZheng , 你是不是清理的卡和位置不对，看下你用的是哪张卡，清理对应的卡，在predict函数后，return之前释放下。 def torch_gc(): if torch.cuda.is_available(): with torch.cuda.device('cuda:1'): torch.cuda.empty_cache() torch.cuda.ipc_collect()

确实不行我检查很多次位置和history 一旦 GPU爆掉（torch.cuda.OutOfMemoryError: CUDA out of memory.）之后清除就不起作用了

Jun 07 '23 09:06 controZheng

你是不是一次推理的token太长，导致一次推理就把显存拉满了，根本就没有清空的机会

---原始邮件--- 发件人: @.> 发送时间: 2023年6月7日(周三) 下午5:01 收件人: @.>; 抄送: @.@.>; 主题: Re: [THUDM/ChatGLM-6B] 怎么释放GPU内存，使用torch.cuda.empty_cache()不起作用 (Issue #1144)

@controZheng , 你是不是清理的卡和位置不对，看下你用的是哪张卡，清理对应的卡，在predict函数后，return之前释放下。 def torch_gc(): if torch.cuda.is_available(): with torch.cuda.device('cuda:1'): torch.cuda.empty_cache() torch.cuda.ipc_collect()

确实不行我检查很多次位置和history 一旦 GPU爆掉（torch.cuda.OutOfMemoryError: CUDA out of memory.）之后清除就不起作用了

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Jun 07 '23 09:06 guiniao

你是不是一次推理的token太长，导致一次推理就把显存拉满了，根本就没有清空的机会 … ---原始邮件--- 发件人: @.> 发送时间: 2023年6月7日(周三) 下午5:01 收件人: @.>; 抄送: @.@.>; 主题: Re: [THUDM/ChatGLM-6B] 怎么释放GPU内存，使用torch.cuda.empty_cache()不起作用 (Issue #1144) @controZheng , 你是不是清理的卡和位置不对，看下你用的是哪张卡，清理对应的卡，在predict函数后，return之前释放下。 def torch_gc(): if torch.cuda.is_available(): with torch.cuda.device('cuda:1'): torch.cuda.empty_cache() torch.cuda.ipc_collect() 确实不行我检查很多次位置和history 一旦 GPU爆掉（torch.cuda.OutOfMemoryError: CUDA out of memory.）之后清除就不起作用了 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

对的对的我在找一个办法能在GPU爆了之后能够清除内存不然每次都要重启服务

Jun 07 '23 09:06 controZheng

碰到过同样问题，通过设置"max_split_size_mb="解决,详见issues https://github.com/CorentinJ/Real-Time-Voice-Cloning/issues/913

Jun 08 '23 08:06 XiongfeiQin

对的对的我在找一个办法能在GPU爆了之后能够清除内存不然每次都要重启服务

找到办法了吗？我也不知道该如何处理这种情况。我正在做一个能稳定运行的API服务器，偶尔爆显存我希望程序能自己恢复过来。

Jun 27 '23 03:06 yuri2peter

ChatGLM-6B ChatGLM-6B copied to clipboard

怎么释放GPU内存，使用torch.cuda.empty_cache()不起作用

Is your feature request related to a problem? Please describe.

Solutions

Additional context

ChatGLM-6B
ChatGLM-6B copied to clipboard