lighthouse Audit prometheus histogram buckets

Audit prometheus histogram buckets

Open michaelsproul opened this issue 3 years ago • 1 comments

Description

Presently our metrics always use the default buckets for histograms, which work OK for times of a few milliseconds to a second, but not so well for shorter/longer times, or large integer values.

I propose we:

Introduce a new primitive in the lighthouse_metrics crate for defining histograms together with their buckets. The underlying machinery already exists in the prometheus crate, we just aren't using it: https://docs.rs/prometheus/0.13.1/prometheus/struct.HistogramOpts.html
Use this new primitive to overhaul existing histograms for which the buckets are poorly sized.

Metrics in need of change

beacon_block_total_size (thanks @dapplion for flagging this)

Version

Lighthouse v2.3.1

Jun 25 '22 12:06 michaelsproul

Buckets of whole numbers, lowest bucket should be 1

beacon_operations_per_block_attestation_total_bucket

Misc:

beacon_block_total_size_bucket: bucket size should range avg block size

This buckets could be proportional to slot time, and extend beyond 1x SECONDS_PER_SLOT to capture really bad network conditions

beacon_block_gossip_propagation_verification_delay_time
beacon_block_gossip_slot_start_delay_time
beacon_block_head_imported_delay_time
beacon_block_head_slot_start_delay_time
beacon_block_imported_observed_delay_time
beacon_block_observed_slot_start_delay_time

Jun 25 '22 12:06 dapplion

lighthouse lighthouse copied to clipboard

Audit prometheus histogram buckets

Description

Metrics in need of change

Version

lighthouse
lighthouse copied to clipboard