text-generation-webui
text-generation-webui copied to clipboard
Bug: Model not Unloading from RAM or vRAM on Linux
Describe the bug
After loading in GGUF model into RAM or vRAM (doesn't matter if it is split between the two). After selecting Unload Model in the UI the model does not clear from my system's vRAM and RAM. I have a dual boot and I have not noticed this issue when running on Windows, this issue only appears on my Ubuntu/Linux system. If load in a new model, it just takes up more RAM and not overwriting the previously stored model.
Also, checking the my systems resource manager (via Task Manager) there is no labeled process that show up with the massive amounts of RAM the model takes up. For example, if I load in the Goliath model which takes up 100GB of RAM, it will not show up in the processes for me to manually free up the RAM.
Is there an existing issue for this?
- [X] I have searched the existing issues
Reproduction
Install Ubuntu 22, install text-gen-webui, download any gguf model, load model, unload model,
Screenshot
No response
Logs
There is no error that appears in terminal nor UI.
System Info
OS: Ubuntu 22.04LTS, 64-bit, X11
RAM: 128GB GGR4
GPU: 3070Ti Nvidia
CPU: AMD Ryzen 5 5600x 6-core processor × 12
To me the same issue sometimes happens on windows as well. it is hard to reproduce but sometimes loading new model does not unload old
On Linux it happens 100% of the time.
Add-on: This is not just llama.cpp and GGUF files, but also GTPQ files with Transformers. Most of the vRam is able to be freed up when unloading using transformers, but anything loaded in to the RAM cannot be removed fro RAM without a reboot.
I could not find a solution to this issue with Ubuntu, but after running this on Fedora 39 the model weights do unload from the RAM and VRAM unload as it does on Windows.