PoC: store well defined metrics as times-series data streams
In recent versions, Elasticsearch has introduced time-series data streams (TSDS) -- a type of data stream that is well suited to storing (and querying) metrics. TSDS reduces disk space usage, and in the future it is expected to provide improved metric aggregations functionality. TSDS enables downsampling (rollup) of metrics, a feature that would enable our users to trade fidelity for cost, to maintain precision of metrics over a longer period for a reasonable cost.
Let's investigate changing the internal metrics data stream to use index_mode: time_series. Metrics will be identified and marked with the time_series_metric attribute. Metric dimensions (e.g. service.name) will be identified and marked with the time_series_dimension attribute.
We should investigate whether we can switch over to TSDS without affecting the UI, or if additional changes are required.
We should use Rally to identify any storage savings (or unexpected costs), ingest throughput degradation, and ideally query performance improvements.
This is currently blocked by https://github.com/elastic/kibana/issues/146804
@kruskall and I discussed yesterday to manually update the ES index template accordingly for continuing to test any performance and UI implications. Also to further look into relevant metric dimensions.
@kruskall could you investigate and add a summary related to
We should investigate whether we can switch over to TSDS without affecting the UI, or if additional changes are required.
We should use Rally to identify any storage savings (or unexpected costs), ingest throughput degradation, and ideally query performance improvements.
We can then decide how to move forward with the PR https://github.com/elastic/apm-server/pull/9730
related https://github.com/elastic/elasticsearch/issues/93564
adding more informations as most of the conversation happened in other channels:
I've opened a separate issue for the rally issue: https://github.com/elastic/apm-server/issues/10206
All the kibana blockers have been solved and the PR was updated to use most of the fields of transaction metrics as dimensions. The total dimensions was around 30 and we bumped into some issues: there is a hard limit of 16 dimensions.
16 is quite limiting and even accounting for fields that provide redundant informations we had to make some sacrifices (https://github.com/elastic/apm-server/pull/9730/commits/e3f691b1f4070d005e1ad046f050711bfe1fd540). I don't think we can move to time-series with that number of dimenions.
Moving this task to backlog and removing the milestone. We can re-investigate when the ES issue with the dimension limit (https://github.com/elastic/elasticsearch/issues/93564) is solved.
https://github.com/elastic/elasticsearch/issues/93564 has been sovled. Is the issue ready to be tackled now or are there other remaining blockers?