ollama
ollama copied to clipboard
Provide an API to retrieve the number of requests being processed
We have integrated Ollama into our inference platform, and we are currently implementing a feature that waits for all active requests to complete before shutting down the pod, ensuring a graceful termination.
We hope Ollama can provide an API for retrieving the number of requests being processed, this could be a Prometheus metrics (gauge type), for example:
/cc
#3144
Will close this in favor of tracking https://github.com/ollama/ollama/issues/10419
@ParthSareen I assume you mean #3144 ?