apm-server icon indicating copy to clipboard operation
apm-server copied to clipboard

PoC: store well defined metrics as times-series data streams

Open axw opened this issue 3 years ago • 7 comments

In recent versions, Elasticsearch has introduced time-series data streams (TSDS) -- a type of data stream that is well suited to storing (and querying) metrics. TSDS reduces disk space usage, and in the future it is expected to provide improved metric aggregations functionality. TSDS enables downsampling (rollup) of metrics, a feature that would enable our users to trade fidelity for cost, to maintain precision of metrics over a longer period for a reasonable cost.

Let's investigate changing the internal metrics data stream to use index_mode: time_series. Metrics will be identified and marked with the time_series_metric attribute. Metric dimensions (e.g. service.name) will be identified and marked with the time_series_dimension attribute.

We should investigate whether we can switch over to TSDS without affecting the UI, or if additional changes are required.

We should use Rally to identify any storage savings (or unexpected costs), ingest throughput degradation, and ideally query performance improvements.

axw avatar Nov 23 '22 03:11 axw

This is currently blocked by https://github.com/elastic/kibana/issues/146804

kruskall avatar Dec 16 '22 01:12 kruskall

@kruskall and I discussed yesterday to manually update the ES index template accordingly for continuing to test any performance and UI implications. Also to further look into relevant metric dimensions.

simitt avatar Dec 20 '22 13:12 simitt

@kruskall could you investigate and add a summary related to

We should investigate whether we can switch over to TSDS without affecting the UI, or if additional changes are required.

We should use Rally to identify any storage savings (or unexpected costs), ingest throughput degradation, and ideally query performance improvements.

We can then decide how to move forward with the PR https://github.com/elastic/apm-server/pull/9730

simitt avatar Jan 16 '23 16:01 simitt

related https://github.com/elastic/elasticsearch/issues/93564

simitt avatar Feb 08 '23 09:02 simitt

adding more informations as most of the conversation happened in other channels:

I've opened a separate issue for the rally issue: https://github.com/elastic/apm-server/issues/10206

All the kibana blockers have been solved and the PR was updated to use most of the fields of transaction metrics as dimensions. The total dimensions was around 30 and we bumped into some issues: there is a hard limit of 16 dimensions.

16 is quite limiting and even accounting for fields that provide redundant informations we had to make some sacrifices (https://github.com/elastic/apm-server/pull/9730/commits/e3f691b1f4070d005e1ad046f050711bfe1fd540). I don't think we can move to time-series with that number of dimenions.

kruskall avatar Feb 08 '23 13:02 kruskall

Moving this task to backlog and removing the milestone. We can re-investigate when the ES issue with the dimension limit (https://github.com/elastic/elasticsearch/issues/93564) is solved.

simitt avatar Feb 08 '23 14:02 simitt

https://github.com/elastic/elasticsearch/issues/93564 has been sovled. Is the issue ready to be tackled now or are there other remaining blockers?

StephanErb avatar Mar 14 '24 23:03 StephanErb