server Support histogram custom metric in Python backend

Is your feature request related to a problem? Please describe. Histogram is a common metric type: https://prometheus.io/docs/concepts/metric_types/#histogram. But the custom metric in Triton Python backend can only be gauge and counter today.

Describe the solution you'd like Add pb_utils.MetricFamily.HISTOGRAM

May 28 '24 19:05 ShuaiShao93

Hi @ShuaiShao93, Triton core/server supports Summary metrics today, but not yet Histograms. Would your requirements be satisfied by adding support for pb_utils.MetricFamily.SUMMARY for now?

Otherwise, we may need to first add Histogram support in Triton core, and then add support for that in the python backend afterwards.

May 28 '24 19:05 rmccorm4

Hi @ShuaiShao93, Triton core/server supports Summary metrics today, but not yet Histograms. Would your requirements be satisfied by adding support for pb_utils.MetricFamily.SUMMARY for now?

Otherwise, we may need to first add Histogram support in Triton core, and then add support for that in the python backend afterwards.

@rmccorm4 Summary is slightly worse than histogram because it's not aggregatable across instances: https://latencytipoftheday.blogspot.com/2014/06/latencytipoftheday-you-cant-average.html. But still better than nothing.

If possible, I hope we can add Summary first, but later also add Histogram.

May 28 '24 20:05 ShuaiShao93

+1 Summary has some much limitations on measuring request P99 latency. One particular issue is that when the duration of 2 requests over the max_age of summary(60s by default), the metric will became Nan(refer to issuecomment), which can not be captured by prometheus.

Jun 18 '24 12:06 lianghao208

@rmccorm4 Since there is no "issue" tab on Triton core project, I am here to propose for adding histogram metric in Triton core. Do I need to open another issue to track this?

Jun 18 '24 12:06 lianghao208

Hi @ShuaiShao93, Triton core/server supports Summary metrics today, but not yet Histograms. Would your requirements be satisfied by adding support for pb_utils.MetricFamily.SUMMARY for now?

Otherwise, we may need to first add Histogram support in Triton core, and then add support for that in the python backend afterwards.

Hi @rmccorm4 May I ask if there is any plan to support histogram metrics for latencies?

I believe in production, using histogram (with proper boundaries) is way better than using summary, particularly when you have a number of inference instances and you want to get an over-all p99 latency. With summary, it is statistically impossible to get a Pxx latency across a number of instances.

Aug 20 '24 02:08 wangli1426

Hi @ShuaiShao93, support for pb_utils.MetricFamily.HISTOGRAM was just added to the python backend and should tentatively be available in the 24.08 release at the end of the month: https://github.com/triton-inference-server/python_backend/pull/374. This will allow you to create and update custom histogram metrics defined in your python backend models.

As for exposing some of the existing latency metrics in core as histograms, this is on our radar and we are interested in implementing this as well - but no ETA yet. Thank you for sharing the interests and engaging in the feature discussion @wangli1426 @lianghao208 ! Please stay tuned.

CC @harryskim @statiraju @yinggeh

Aug 20 '24 18:08 rmccorm4

@rmccorm4 Thanks! This is awesome!

Can't wait to use the histogram latency as well

Aug 30 '24 19:08 ShuaiShao93

Closing since this feature has been completed.

Sep 06 '24 14:09 Tabrizian

@Tabrizian is there a separate issue to add the histogram metric to Triton core that we could track?

Sep 06 '24 16:09 AshwinAmbal