bazel-buildfarm
bazel-buildfarm copied to clipboard
Configurable bucket for histogram metrics
All buildfarm histogram metrics except grpclatency is not configurable, and some of them have improper default value, especially time related metrics on large repo (execution_time_ms,input_fetch_time_ms,queue_time_ms...) with too small bucket (the largest is [10ms, +inf]).
Can we make those bucket configurable or set some more proper default vaules? Thanks.
It might be sufficient to update the prometheus client dependency from using io.prometheus:simpleclient:0.10.0 to io.prometheus:prometheus-metrics-core:1.0.0 which now supports prometheus native histograms ( see http://prometheus.github.io/client_java/getting-started/metric-types/#histogram ) That does have the caveat that the prometheus server run with --enable-feature=native-histograms -- so there might be value in implementing custom buckets for classic histograms too.
Note upgrading to the 1.0.0 prometheus client would also require updating/patching me.dinowernli:java-grpc-prometheus as it uses simpleclient as well to register its metrics if grpc metrics are enabled -- it would at least need to rebuild with the new compatability layer.
Can you try this? This should resolve your issue. https://github.com/bazelbuild/bazel-buildfarm/pull/1376
#1376 seems related, but only addresses the latency histograms created by the me.dinowernli:java-grpc-prometheus package.
This report specifically mentions the other metrics such as execution_time_ms , input_fetch_time_ms, and queue_time_ms, among others.
None of these call .buckets() to override the default buckets of .005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, and 10.
There is one example of a Histogram overriding the default buckets in cas_ttl_s but that one is not configurable.