pyroscope
pyroscope copied to clipboard
feat: Optional include label value cardinality in `/LabelNames`
closes https://github.com/grafana/pyroscope/issues/3226
Overview
This PR is a POC change that allows requesting estimated label value cardinality alongside label names. This will allow the UI to make substantially fewer requests in order to populate the the label view (see below).
In order for the UI to render the labels bar, it needs to:
- Make a
/LabelNames
request - For every name, make a
/LabelValues
request - Sort by cardinality (higher goes first)
- Render the bar
This is quite expensive in terms of network cost to select the top N labels by cardinality.
Approach
This PR modifies the /LabelNames
API to allow a caller to optionally request estimated cardinality:
message LabelNamesRequest {
repeated string matchers = 1;
int64 start = 2;
int64 end = 3;
optional bool include_cardinality = 4;
}
This would provide the following response:
message LabelNamesResponse {
repeated string names = 1;
repeated int64 estimated_cardinality = 2;
}
where names
and estimated_cardinality
are a 1:1 correspondence (if cardinality is requested). If cardinality is not requested, estimated_cardinality
will be empty.
Note that the cardinality returned is an estimate. This PR does not deduplicate merged responses. This is fine, as the UI needs a way to select the top N labels by cardinality to render the label bar. It can make subsequent requests to /LabelValues
to find exact counts. This should reduce network calls from 20 or more to ~7.
Performance
I ran three 10 minute load tests, each requesting 24 hours of data for service_name="fire-dev-001/querier
.
-
/LabelNames
with no cardinality -
/LabelNames
with cardinality -
/LabelNames
with no cardinality followed by N/LabelValues
TODO(bryan): Attach benchmark results
Test | # Reqs | p99 latency |
---|---|---|
/LabelNames with no cardinality |
? | ? |
/LabelNames with cardinality |
? | ? |
/LabelNames / /LabelValues |
? | ? |
Explore Profiles view
Results
As expected, this PR puts more pressure on store gateways fulfilling a /LabelNames
requests with cardinality as they need to also look up the label value when scanning postings. However, I think this tradeoff is acceptable as it reduces subsequent network calls the store gateways need to handle. So a /LabelNames
request that is 2x more expensive alleviates the need for 15 more cheaper requests.
Alternatives
This PR provides one possible approach, which is to reuse the /LabelNames
endpoint. An alternative is to create a brand new endpoint /LabelCardinality
. This has the advantage of making performance characteristics much more predictable. E.g. a /LabelNames
call with cardinality isn't suddenly much more expensive. We could also tune the endpoint to do the "top N" calculation server-side and provide the exact cardinality instead of an estimate.
A drawback, of course, is yet another API endpoint specifically to power a single component in the UI.