FastChat
FastChat copied to clipboard
Occupied GPU memory keeps growing with more talks
Is there any way to reduce the occupied memory as I clean current history?
This is probably because the code keeps track of the outputs from the model. Try changing the code to skip this.
I suppose you are using the web interface? This shouldn't happen when you clear the history. Could you provide more details?
I suppose you are using the web interface? This shouldn't happen when you clear the history. Could you provide more details?
Yes, I am using the web interface. Now the occupied memory is stable at 31.5GB and will not continue growing. Althought the memory still will not reduce when I clean the history, I think it's not a problem now. Maybe this issue can be closed.
This is still a problem in other configurations. Like in Windows using the cli interface (where you cannot clear conversation), if you have a 16GB GPU and run 13B with load 8 bit (to reduce size), it will work for a bit until it reaches that conversation limit and breaks
This is compounded by the fact that the web interface doesn't work in windows, so this is all we have at the moment.
I think it would be ideal if somehow a cap on the memory usage is kept so it never goes over what's allowed (other ones do this). In this case, erasing previous conversation to allow memory to not go over is a valid thing, since otherwise it would break the program.
Is this solved yet? This is also a problem when using the OpenAI API in Linux environments too @zhisbug @merrymercy
@Abhijit-2592 this is not an issue, really. As the chat gets longer, it takes more memory, which is freed when you clean the conversation, no?
this is expected behavior as @surak explained. we've been serving models for long time on chat.lmsys.org and did not find issue. however, if you find evidence for memory leak. let us know.