The server doesn't clean the GPU and RAM memory when changing AI or its layers, causing the server to freeze
When running Kobold AI with the Adventure 6B model, I managed to run out of GPU VRAM so I decided to reload the AI with setting less GPU layers to use more CPU and RAM. But when the client was reloading the model (and its layers), it seem that it didn't clean its memory usage, as my task processor shown RAM and VRAM still being full during the loading procedure. Which caused the app frozen (like in one similar case when it ran out of VRAM and there it eventually returned exception of CUDA being out of memory).
To reproduce its issue, load any AI model with settings which use your VRAM fully. Then load same AI model through the AI button in the client with setting less layers for GPU. And notice how now the app froze.
This problem happened on the United (developer) version of the KoboldAI.
This also happens for me -- the model doesn't get unloaded when loading a new model.
Even choosing the "No AI" model doesn't make a previously-loaded model unload.
I have to kill the script and restart it to switch models which is pretty annoying (since it clears the story/settings etc).
We are aware of it, but no solution has been found yet. It is related to the splitting between multiple devices, we have routines in Kobold that are supposed to clean any models in memory using a mixture of torch's cleanup and garbage collectors. But for an unknown reason these do not work on models loaded with this kind of splitting. For now the workaround is to restart Kobold altogether.