ollama Provide an API to retrieve the number of requests being processed

Provide an API to retrieve the number of requests being processed

Open cr7258 opened this issue 7 months ago • 2 comments

We have integrated Ollama into our inference platform, and we are currently implementing a feature that waits for all active requests to complete before shutting down the pod, ensuring a graceful termination.

We hope Ollama can provide an API for retrieving the number of requests being processed, this could be a Prometheus metrics (gauge type), for example: