ollama icon indicating copy to clipboard operation
ollama copied to clipboard

Provide an API to retrieve the number of requests being processed

Open cr7258 opened this issue 7 months ago • 2 comments

We have integrated Ollama into our inference platform, and we are currently implementing a feature that waits for all active requests to complete before shutting down the pod, ensuring a graceful termination.

We hope Ollama can provide an API for retrieving the number of requests being processed, this could be a Prometheus metrics (gauge type), for example:

cr7258 avatar Apr 26 '25 03:04 cr7258

/cc

googs1025 avatar Apr 26 '25 04:04 googs1025

#3144

rick-github avatar Apr 26 '25 15:04 rick-github

Will close this in favor of tracking https://github.com/ollama/ollama/issues/10419

ParthSareen avatar Apr 28 '25 05:04 ParthSareen

@ParthSareen I assume you mean #3144 ?

hendrikebbers avatar May 19 '25 08:05 hendrikebbers