FastChat
FastChat copied to clipboard
Memory leak, Windows
When using CUDA, there appears to be a memory leak on Windows systems with either the CLI or UI. Any messages sent to the model will cause the GPU memory usage to increase steadily until a memory exception is hit.
Is this solved?
related with #211 . Is this issue solved?
I am not sure this is a bug. The conversation keeps growing, until it fills up memory. On the web UI or on other frontends (like continue.dev), you see this and you clean the conversation.
I commented on the out of memory issue here and showed a history of the GPU memory rising and falling, but steadily increasing: https://github.com/lm-sys/FastChat/issues/2701 But these other issues (211, 301) are exactly what I experienced, just on Ubuntu with CLI. Perhaps we could have a command line parameter to set a limit for conversation length (anything past that point is dropped), or a command we can run in the CLI to reset it. Without this, you have to Ctrl-C, re-run the command, and wait for the model to load, every time you get close to the limit. If you have limited resources, you can't finish a conversation before this point. Ideally some rolling window of conversation history is kept, everything else is dropped. I'm using a multiple-GPU config, with very old GPUs (4 GTX 1070's with 8GB each), so maybe it surfaces more in this configuration since there is not much memory to begin with. But I imagine anyone trying to use this as a persistent service is going to hit the limit at some point, even if you have lots of memory. Maybe the multiple-GPU scenario is what triggers it? Sounds like it's supporting chat.lmsys.org no problem, so I'm not sure what's different in that environment @infwinston @surak