timbala icon indicating copy to clipboard operation
timbala copied to clipboard

Current partition key schema requires that queries must be run against all nodes

Open mattbostock opened this issue 6 years ago • 0 comments

The current partition key schema, in addition to the lack of any centralised index, requires that queries must be run against all nodes in the cluster.

The current schema can be represented as:

<salt>:<bucket_end_time_as_YYYYMMDD>:<metric_name>:[<label_name>,<label_name>...]

Since the label names are often not known at query time, and PromQL allows querying without the metric name, the current schema means that all nodes must be queried in order to ensure that all matches time-series are retrieved.

The partition key schema should be improved to limit the number of nodes that must be queried. There is an inherent tension between limiting the number of nodes that need to be queried and balancing ingestion across as many nodes in the cluster as possible.

mattbostock avatar Nov 01 '17 02:11 mattbostock