boxo
boxo copied to clipboard
gateway: separate metrics for Signed IPNS from DNSLink
Extracted from https://github.com/ipfs/kubo/issues/9927#issuecomment-1623791077
Currently, we have basic request counts and durations for gateway=ipfs and gateway=ipns namespaces in the form of boxo/gateway/metrics.go metrics:
# HELP ipfs_http_gw_get_duration_seconds The time to GET a successful response to a request (all content types).
# TYPE ipfs_http_gw_get_duration_seconds histogram
ipfs_http_gw_get_duration_seconds_bucket{gateway="ipfs",le="0.05"} 8
[..]
ipfs_http_gw_get_duration_seconds_bucket{gateway="ipfs",le="1920"} 11
ipfs_http_gw_get_duration_seconds_bucket{gateway="ipfs",le="+Inf"} 11
ipfs_http_gw_get_duration_seconds_sum{gateway="ipfs"} 1.185360469
Problem
-
/ipnssupports both DNSLink and Signed IPNS records – we have no visibility what is the % of each - we measure success only, have no visibility into % of IPNS record failures vs DNSLink failures
Solution
Requirements
TBD, initial requirements
- we need dedicated metric for each type of
/ipns/request-
signed_ipns -
dnslink
-
- we need to be able to tell:
- how many requests were sent by clients
- how many requests were successful vs errored
- how long success / error takes? (could be precomputed P50/P95)
- we need to make sure this is visible in Thunderdome testing so we can catch regressions here during release phase
Open questions
- do we have a separate metrics for success/failure, or do we have single one with success/error attribute?
- do we do histogram with predefined duration buckets and implicit counter (like
ipfs_http_gw_get_duration_seconds)? - or maybe, instead of picking arbitrary duration buckets (like we have in legacy metrics) we should have P50, P75, P95, P99 Objectives, like we do here?
I added a requirement for Thunderdome testing visibly so we can catch regressions easier.
This was an action from the 0.22 retro: https://www.notion.so/pl-strflt/Kubo-0-22-Retro-d9800a96661b44a3ba5fa046926323cb?pvs=4#ad81265cf9ae4082805c8f566d54e243