elasticsearch icon indicating copy to clipboard operation
elasticsearch copied to clipboard

Auto follow patterns should not auto follow internal hidden indices / data streams

Open martijnvg opened this issue 3 years ago • 5 comments

Today when setting up an auto follow pattern in CCR that follows (almost) everything then also internal data streams / indices can be auto followed from the remote cluster. For example slm, ilm or watcher history. I don't think that these data streams / indices should be auto followed, because each cluster has their own history for each of those components. This would only make the history of ilm/slm/watcher in the follow cluster more complicated to understand, since it will have history for ilm/slm/watcher from the remote clusters and local cluster.

There is hard coded logic in auto follow patterns to never auto follow system data streams and indices. I think we should have something similar for internal hidden data streams / indices.

Maybe we just determine the list of hidden indices/data streams to exclude based on the internal IndexTemplateRegistry instances? I also wonder whether in general hidden indices/data streams should be replicated? Maybe we can add a parameter to auto follow patterns that controls whether hidden data streams/indices are replicated (and default to false)?

martijnvg avatar Dec 15 '21 10:12 martijnvg

Pinging @elastic/es-data-management (Team:Data Management)

elasticmachine avatar Dec 15 '21 10:12 elasticmachine

Pinging @elastic/es-distributed (Team:Distributed)

elasticmachine avatar Dec 15 '21 10:12 elasticmachine

I don't think that these data streams / indices should be auto followed, because each cluster has their own history for each of those components.

I think this reasoning is sound for almost all cases, but I've had users ask if we would consider supporting CCR for system indices to better support the disaster recovery use case, and I suspect the same folks be using (or want to use) this functionality to replicate hidden indices as well, even though that data is "just" informational.

Maybe we just determine the list of hidden indices/data streams to exclude based on the internal IndexTemplateRegistry instances? I also wonder whether in general hidden indices/data streams should be replicated?

I'd advocate for "hidden indices don't get replicated" over "Elastic's hidden indices don't get replicated" just for simplicity's sake - anything that involves magical lists of things in code is difficult to troubleshoot.

Maybe we can add a parameter to auto follow patterns that controls whether hidden data streams/indices are replicated (and default to false)?

I'd prefer doing this over saying that hidden indices never get replicated, as I think this covers the DR case I mentioned above pretty well while simplifying the model for most users. This also gives us a migration path: Add the new parameter & deprecate the default, then eventually switch the default when we can.

gwbrown avatar Mar 22 '22 14:03 gwbrown

Thanks @gwbrown for sharing your thoughts here. I agree with them.

I'd advocate for "hidden indices don't get replicated" over "Elastic's hidden indices don't get replicated" just for simplicity's sake

👍

This also gives us a migration path: Add the new parameter & deprecate the default, then eventually switch the default when we can.

👍

martijnvg avatar Mar 24 '22 10:03 martijnvg

We discussed this in the @elastic/es-distributed meeting and agree with the approach proposed above.

Leaf-Lin avatar Mar 29 '22 13:03 Leaf-Lin