Make distributed queries a reality

Open okayzed opened this issue 7 years ago • 1 comments

Distributed sybil

This is an umbrella task for working towards distributed queries with sybil as an aggregator.

setting up docker / k8s for deploying sybil + a small network wrapper for ingesting and querying

distributed pruning of data - should be a histogram query on the time + size fields followed by a trim command: "delete all data before X time", where X is deduced from the histogram query.

assumes same min/max values for int columns, so histogram bins are even sized on one leaf, but may be different across leaves. need to switch to more mergeable histograms when sending result from leaf -> aggregator (like t-digest)
large cardinality + count distinct: currently implemented as len(Results). obviously not going to work for high cardinality
large cardinality time series: if we make time series for top 10 on a leaf, we still need to send a buffer of extra series just in case to the aggregator.

Oct 10 '17 17:10 okayzed

Steps that we've taken to get to distributed queries

Nov 15 '17 17:11 okayzed