text-generation-webui icon indicating copy to clipboard operation
text-generation-webui copied to clipboard

Bug: Model not Unloading from RAM or vRAM on Linux

Open CHesketh76 opened this issue 1 year ago • 3 comments

Describe the bug

After loading in GGUF model into RAM or vRAM (doesn't matter if it is split between the two). After selecting Unload Model in the UI the model does not clear from my system's vRAM and RAM. I have a dual boot and I have not noticed this issue when running on Windows, this issue only appears on my Ubuntu/Linux system. If load in a new model, it just takes up more RAM and not overwriting the previously stored model.

Also, checking the my systems resource manager (via Task Manager) there is no labeled process that show up with the massive amounts of RAM the model takes up. For example, if I load in the Goliath model which takes up 100GB of RAM, it will not show up in the processes for me to manually free up the RAM.

Is there an existing issue for this?

  • [X] I have searched the existing issues

Reproduction

Install Ubuntu 22, install text-gen-webui, download any gguf model, load model, unload model,

Screenshot

No response

Logs

There is no error that appears in terminal nor UI.

System Info

OS: Ubuntu 22.04LTS, 64-bit, X11
RAM: 128GB GGR4
GPU: 3070Ti Nvidia
CPU: AMD Ryzen 5 5600x 6-core processor × 12

CHesketh76 avatar Feb 09 '24 15:02 CHesketh76

To me the same issue sometimes happens on windows as well. it is hard to reproduce but sometimes loading new model does not unload old

Tedy50 avatar Feb 12 '24 22:02 Tedy50

On Linux it happens 100% of the time.

CHesketh76 avatar Feb 13 '24 01:02 CHesketh76

Add-on: This is not just llama.cpp and GGUF files, but also GTPQ files with Transformers. Most of the vRam is able to be freed up when unloading using transformers, but anything loaded in to the RAM cannot be removed fro RAM without a reboot.

CHesketh76 avatar Feb 13 '24 16:02 CHesketh76

I could not find a solution to this issue with Ubuntu, but after running this on Fedora 39 the model weights do unload from the RAM and VRAM unload as it does on Windows.

CHesketh76 avatar Feb 22 '24 14:02 CHesketh76