vector icon indicating copy to clipboard operation
vector copied to clipboard

Support index lifecycle management for Elasticsearch sink

Open jszwedko opened this issue 4 years ago • 9 comments

Support the same options that logstash does for managing index lifecycles: https://github.com/logstash-plugins/logstash-output-elasticsearch/blob/master/docs/index.asciidoc#index-lifecycle-management

jszwedko avatar Jun 04 '21 00:06 jszwedko

I've found that the ILM configuration in logstash is pretty bad, IMHO - It might be better to focus on data streams support?

spencergilbert avatar Jun 04 '21 16:06 spencergilbert

@spencergilbert we plan to do both, but we can focus on datastreams first. Do you have thoughts on what an improved version of index lifecycle management might look like? Or do you think it's not even worth it given datastreams? We've had some users ask for ILM support.

jszwedko avatar Jun 04 '21 16:06 jszwedko

I think all in on datastreams is probably better, by my understanding it manages a lot of the ILM work the client would need to implement.

Logstash ILM was painful when I used it because you can't/couldn't supply the alias as a template based on log fields.

spencergilbert avatar Jun 04 '21 17:06 spencergilbert

There is a "problem" regarding this: the index name is a template so we cannot say, without a given event, what will be the indexes. This implies that we'll have to upsert the ILM and templates definition when we receive an event. In that case, each time we'll receive events, we'll have to do 3 calls: 1 to create the ILM, 1 to create the template and 1 to push the metrics. This would most probably kill the performances of the sink. We could think of having a cache at the vector level, which could become tricky when you're running several instances of vector in parallel and increase the memory usage. Now, if we take a look at an other sinks, Clickhouse, it needs a migration to work. Maybe, elasticsearch would need a migration to work as well.

jdrouet avatar Jun 28 '21 12:06 jdrouet

After some discussion we've decided to punt on this for now given we have added datastreams support which seems to be the ordained path for getting observability data into Elasticsearch and handles index lifecycle management for you. We'll leave this open to collect additional use-cases for ILM though.

jszwedko avatar Jun 28 '21 19:06 jszwedko

Hi, hopefully this isn't too OT - after reading the above, looks like we need to get into data streams. Can somebody explain how data streams "handle index lifecycle management for you"? According to the docs it's still necessary to set up ILM. What have I missed?

antgel avatar Apr 11 '22 08:04 antgel

Hi, hopefully this isn't too OT - after reading the above, looks like we need to get into data streams. Can somebody explain how data streams "handle index lifecycle management for you"? According to the docs it's still necessary to set up ILM. What have I missed?

I think particularly data streams avoid the hassle described here: https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started-index-lifecycle-management.html#manage-time-series-data-without-data-streams

spencergilbert avatar Apr 11 '22 13:04 spencergilbert

@jszwedko @spencergilbert - is this feature support worked upon

pgvishnuram avatar Sep 21 '22 15:09 pgvishnuram

@jszwedko @spencergilbert - is this feature support worked upon

Not currently. We'd be happy to review a proposal for it though! Many users seem to have moved onto data streams for telemetry data.

jszwedko avatar Sep 21 '22 15:09 jszwedko

@spencergilbert @jszwedko how can one decide on which data_stream name, index template, or ILM policy vector is going to use?

From my understanding I need to create the index template and ilm policy before hand. But I am not sure how to setup vector in a way that will work with my custom ilm and index template.

DekelDevunet avatar Nov 14 '23 14:11 DekelDevunet