FastChat Memory leak, Windows

trafficstars

When using CUDA, there appears to be a memory leak on Windows systems with either the CLI or UI. Any messages sent to the model will cause the GPU memory usage to increase steadily until a memory exception is hit.

Apr 08 '23 10:04 Aemon-Algiz

Is this solved?

Apr 09 '23 10:04 merrymercy

related with #211 . Is this issue solved?

Apr 21 '23 02:04 zhisbug

I am not sure this is a bug. The conversation keeps growing, until it fills up memory. On the web UI or on other frontends (like continue.dev), you see this and you clean the conversation.

Oct 21 '23 15:10 surak

I commented on the out of memory issue here and showed a history of the GPU memory rising and falling, but steadily increasing: https://github.com/lm-sys/FastChat/issues/2701 But these other issues (211, 301) are exactly what I experienced, just on Ubuntu with CLI. Perhaps we could have a command line parameter to set a limit for conversation length (anything past that point is dropped), or a command we can run in the CLI to reset it. Without this, you have to Ctrl-C, re-run the command, and wait for the model to load, every time you get close to the limit. If you have limited resources, you can't finish a conversation before this point. Ideally some rolling window of conversation history is kept, everything else is dropped. I'm using a multiple-GPU config, with very old GPUs (4 GTX 1070's with 8GB each), so maybe it surfaces more in this configuration since there is not much memory to begin with. But I imagine anyone trying to use this as a persistent service is going to hit the limit at some point, even if you have lots of memory. Maybe the multiple-GPU scenario is what triggers it? Sounds like it's supporting chat.lmsys.org no problem, so I'm not sure what's different in that environment @infwinston @surak

Dec 15 '23 04:12 jbeno

FastChat FastChat copied to clipboard

Memory leak, Windows

FastChat
FastChat copied to clipboard