Support histogram custom metric in Python backend
Is your feature request related to a problem? Please describe. Histogram is a common metric type: https://prometheus.io/docs/concepts/metric_types/#histogram. But the custom metric in Triton Python backend can only be gauge and counter today.
Describe the solution you'd like
Add pb_utils.MetricFamily.HISTOGRAM
Hi @ShuaiShao93, Triton core/server supports Summary metrics today, but not yet Histograms. Would your requirements be satisfied by adding support for pb_utils.MetricFamily.SUMMARY for now?
Otherwise, we may need to first add Histogram support in Triton core, and then add support for that in the python backend afterwards.
Hi @ShuaiShao93, Triton core/server supports Summary metrics today, but not yet Histograms. Would your requirements be satisfied by adding support for
pb_utils.MetricFamily.SUMMARYfor now?Otherwise, we may need to first add Histogram support in Triton core, and then add support for that in the python backend afterwards.
@rmccorm4 Summary is slightly worse than histogram because it's not aggregatable across instances: https://latencytipoftheday.blogspot.com/2014/06/latencytipoftheday-you-cant-average.html. But still better than nothing.
If possible, I hope we can add Summary first, but later also add Histogram.
+1 Summary has some much limitations on measuring request P99 latency. One particular issue is that when the duration of 2 requests over the max_age of summary(60s by default), the metric will became Nan(refer to issuecomment), which can not be captured by prometheus.
@rmccorm4 Since there is no "issue" tab on Triton core project, I am here to propose for adding histogram metric in Triton core. Do I need to open another issue to track this?
Hi @ShuaiShao93, Triton core/server supports Summary metrics today, but not yet Histograms. Would your requirements be satisfied by adding support for
pb_utils.MetricFamily.SUMMARYfor now?Otherwise, we may need to first add Histogram support in Triton core, and then add support for that in the python backend afterwards.
Hi @rmccorm4 May I ask if there is any plan to support histogram metrics for latencies?
I believe in production, using histogram (with proper boundaries) is way better than using summary, particularly when you have a number of inference instances and you want to get an over-all p99 latency. With summary, it is statistically impossible to get a Pxx latency across a number of instances.
Hi @ShuaiShao93, support for pb_utils.MetricFamily.HISTOGRAM was just added to the python backend and should tentatively be available in the 24.08 release at the end of the month: https://github.com/triton-inference-server/python_backend/pull/374. This will allow you to create and update custom histogram metrics defined in your python backend models.
As for exposing some of the existing latency metrics in core as histograms, this is on our radar and we are interested in implementing this as well - but no ETA yet. Thank you for sharing the interests and engaging in the feature discussion @wangli1426 @lianghao208 ! Please stay tuned.
CC @harryskim @statiraju @yinggeh
@rmccorm4 Thanks! This is awesome!
Can't wait to use the histogram latency as well
Closing since this feature has been completed.
@Tabrizian is there a separate issue to add the histogram metric to Triton core that we could track?