loki
loki copied to clipboard
feat: Introduce shardable probabilistic topk for instant queries.
What this PR does / why we need it:
This change introduces a very simplified shardable topk approximation through the new vector aggregation approx_topk.
We use a count min sketch and track all labels not just the top k. Since this list can grow quite large the feature is only supported for instant queries. Grouping is also not supported and should be handled by an inner sum by or sum without even though this might not be the same behaviour as topk by.
The sharding works by turning the approx_topk(k, inner) query into the following expression:
topk(
k,
eval_cms(
__count_min_sketch__(inner, shard=1) ++ __count_min_sketch__(inner, shard=2)...
)
)
__count_min_sketch__ is calculated for each shard and merge on the frontend. eval_cms iterates through the labels list and determines the count for each. topk selects then the top items.
The number of labels tracked on the querier side when evaluating __count_min_sketch__ is limited by the heap. It does count all values but the ke-value pairs might not be known.
Special notes for your reviewer:
Checklist
- [x] Reviewed the
CONTRIBUTING.mdguide (required) - [x] Documentation added
- [x] Tests updated
- [x] Title matches the required conventional commits format, see here
- Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such,
featPRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
- Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such,
- [ ] Changes that require user attention or interaction to upgrade are documented in
docs/sources/setup/upgrade/_index.md - [ ] For Helm chart changes bump the Helm chart version in
production/helm/loki/Chart.yamland updateproduction/helm/loki/CHANGELOG.mdandproduction/helm/loki/README.md. Example PR - [ ] If the change is deprecating or removing a configuration option, update the
deprecated-config.yamlanddeleted-config.yamlfiles respectively in thetools/deprecated-config-checkerdirectory. Example PR
If we're doing that then we might as well just have a list of tuples, labelset + count, and remove the sketch.
@cstyan you are right. My rationale was that we could first validate if the team and our users accept instant queries with a special probabilistic key word and see what the implementation looks like. Then worry about the algorithm. As for the lables I'd keep it super simple and use a min-heap with max 10k labels. As Ed explained the most prominent use case is for finding the top outliers in a high cardinality set.
It works :tada: