LocalAI
LocalAI copied to clipboard
docs: Offload/stop backend API
Is your feature request related to a problem? Please describe.
GPU resources are limited. Once model is loaded into GPU memory you can't use any other big model until previous backends are stopped.
Currently if backend is loaded and running its impossible to unload it using API. I have to docker exec
into container and kill backend process.
Describe the solution you'd like Create an endpoint to unload specific backend or all backends.
GET /v1/backends
- list running backends
DELETE /v1/backends/<backend_id>
- unload backend
DELETE /v1/backends
- unload ALL backends