LocalAI icon indicating copy to clipboard operation
LocalAI copied to clipboard

Method to unload model(s) would be very useful.

Open j4ys0n opened this issue 1 year ago • 9 comments

Is your feature request related to a problem? Please describe.

When running multiple distributed workers, if I have to change or restart the service on a worker, I have to bring the entire cluster down. Restarting 1 worker does not unload the models that are loaded except from the respective worker. The VRAM on the other workers does not change. (separate issue, but restarting a worker also results in a new worker id)

Describe the solution you'd like

And API endpoint to unload individual and all models.

Describe alternatives you've considered

I'm not sure how else to address this issue.

Additional context

j4ys0n avatar Sep 23 '24 15:09 j4ys0n

To clarify here @j4ys0n - are you referring to unload models from a group of federated workers right? or are you referring to llama.cpp workers?

JFYI we have /backend/shutdown for unloading a single model, but indeed that does not propagate to all federated workers.

mudler avatar Sep 23 '24 16:09 mudler

both federated and llama.cpp workers. there should be a way to unload models from workers without having to restart the services. same with removing workers from the cluster, there should be a way to remove workers and not have the coordinator think there's one missing.

j4ys0n avatar Sep 23 '24 16:09 j4ys0n

https://github.com/mudler/LocalAI/issues/3378 I made similar issue

Nyralei avatar Sep 23 '24 20:09 Nyralei

So im digging into this issue, any thoughts on what i should go look at @mudler

Anywho if i see something ill say something :D

levidehaan avatar Nov 15 '24 10:11 levidehaan

@mudler - any luck with this one? Becomes a bit of a problem when having discussions with multiple models async of which only some are active or if hooking different kinds of automation to the API which calls on different models. Manual unloading via API is an option as is making an agent which will do the same thing if i can get heuristics on model activity but seems most sensible as an internal function. Any thoughts on how you want to tackle architecturally? "LRU" or something a bit more sophisticated?

sempervictus avatar May 11 '25 12:05 sempervictus

Would love this feature!

Little709 avatar Aug 16 '25 10:08 Little709

It would also be nice if there were a setting to automatically unload models when they are unused for a period of time.

ericomeehan avatar Aug 28 '25 23:08 ericomeehan

Yes, agreed. I like to switch between various models and would like Local AI to be able to handle the automatic unloading of previous models to free up VRAM.

MikeNatC avatar Sep 15 '25 17:09 MikeNatC

LOCALAI_WATCHDOG_IDLE=true
LOCALAI_WATCHDOG_IDLE_TIMEOUT=15m

With these set as shown the application is unloading things properly for me now.

ericomeehan avatar Oct 28 '25 21:10 ericomeehan