LocalAI
LocalAI copied to clipboard
Method to unload model(s) would be very useful.
Is your feature request related to a problem? Please describe.
When running multiple distributed workers, if I have to change or restart the service on a worker, I have to bring the entire cluster down. Restarting 1 worker does not unload the models that are loaded except from the respective worker. The VRAM on the other workers does not change. (separate issue, but restarting a worker also results in a new worker id)
Describe the solution you'd like
And API endpoint to unload individual and all models.
Describe alternatives you've considered
I'm not sure how else to address this issue.
Additional context
To clarify here @j4ys0n - are you referring to unload models from a group of federated workers right? or are you referring to llama.cpp workers?
JFYI we have /backend/shutdown for unloading a single model, but indeed that does not propagate to all federated workers.
both federated and llama.cpp workers. there should be a way to unload models from workers without having to restart the services. same with removing workers from the cluster, there should be a way to remove workers and not have the coordinator think there's one missing.
https://github.com/mudler/LocalAI/issues/3378 I made similar issue
So im digging into this issue, any thoughts on what i should go look at @mudler
Anywho if i see something ill say something :D
@mudler - any luck with this one? Becomes a bit of a problem when having discussions with multiple models async of which only some are active or if hooking different kinds of automation to the API which calls on different models. Manual unloading via API is an option as is making an agent which will do the same thing if i can get heuristics on model activity but seems most sensible as an internal function. Any thoughts on how you want to tackle architecturally? "LRU" or something a bit more sophisticated?
Would love this feature!
It would also be nice if there were a setting to automatically unload models when they are unused for a period of time.
Yes, agreed. I like to switch between various models and would like Local AI to be able to handle the automatic unloading of previous models to free up VRAM.
LOCALAI_WATCHDOG_IDLE=true
LOCALAI_WATCHDOG_IDLE_TIMEOUT=15m
With these set as shown the application is unloading things properly for me now.