serving
serving copied to clipboard
Queue-proxy scrape prometheus metrics from containers in pod
Describe the feature
There is an issue with configuring multiple prometheus ports from a single pod. Currently, prometheus does not support this use case, and we would like to make a workaround so that we can scrape metrics from queue-proxy + another container in a single pod.
Background:
We are using knative with kserve. So right now, we have two containers in one pod - queue-proxy
and the kserve-container
. Queue-proxy emits prometheus metrics, and we want kserve-container to emit its own, distinct metrics- latency histograms for each step/method called in the kserve-container.
There is a Github issue describing the problem with prometheus and configuring scraping from multiple ports in a single pod. The hacky solution here is to have relabel config settings.
Another option we were thinking about is if we implemented a pattern similar to istio-proxy and have queue-proxy scrape the kserve-container and then send the prometheus metrics from queue-proxy. (see: Istio's Prometheus Scraping Standardization doc).
Curious if this is an issue others are experiencing and if there are any other ideas? Thanks!
What's the goal to scarp the metrcis from kserve-container
?
We need to get latency metrics for each method in the kserve-container
. This is really important for understanding where the bottleneck is, if there is one. For example - if a request in the kserve-container
hits the pre_process, predict, and then post_process methods and the latency is super high, we currently have no visibility into which step is the bottleneck. Adding histogram metrics, for example, around each method would give us that visibility. Even if we only had one method running in another container along with queue-proxy, it would still be useful to have some metrics on the performance.
this is an issue due to the current limitations of prometheus, as noted in the github issue linked above.
I think this is not a common use case. Maybe u can give a service port for user-container, and u can use like http get requests to scrap the metrics?
I think this is not a common use case. Maybe u can give a service port for user-container, and u can use like http get requests to scrap the metrics?
also, here is a thread in the knative slack channel that may be helpful relating to this issue. https://knative.slack.com/archives/C93E33SN8/p1662483290964459
This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen
. Mark the issue as
fresh by adding the comment /remove-lifecycle stale
.