elasticsearch
elasticsearch copied to clipboard
Discussion: Tracking changes to index templates, component templates and ingest pipelines
Description
The "Logs+" initiative in Observability tries to make the experience around logs in the Elastic stack as seamless as possible.
An important part of this is detecting and mitigating ingestion issues. Most of the time ingestion issues start because something in the system changed. This can either be a change on the collection side or on the Elasticsearch side (mappings / ingest pipelines were rearranged, fleet integration packages got updated, ...)
When investigating an issue in this area, it would be very helpful to be able to understand what changes were made when things started to go south. There already is a very important building block for this - via the _ignored field and the failure store, it's possible to reconstruct when things started to act up.
The other important part is correlating the occurring errors with changes to the system - in a visual way, this is what I'm trying to get to:
It's already possible to plot the errors over time, what's challenging is to give the user access to the annotations - changes to the configuration of the system. However, having access to this information and correlating both signals should speed up time-to-resolution a lot in a lot of cases. Having this information also would allow to automate or at least to simplify getting back to a working system by rolling back applied changes.
Some rough ideas / thoughts:
- For each datastream, there could be a hidden
.changes
index which is written to each time an index template matching the stream, a component template referenced in this index template or an ingest pipeline referenced in it is updated - The change documents would need to contain:
- timestamp of the change
- delta of the change (what part of the configuration got updated how)
- metadata about the change (who triggered it)
- This isn't really something that can live on the Kibana layer - Kibana could track changes made through fleet automation, but it would miss changes that target Elasticsearch APIs directly which can be quite common based on the users setup
- There are permission and storage concerns - who can access this information and how long should it live?
- This is slightly distinct from the whole "stack monitoring" use case, as it's ultimately about the soundness of the configuration, not operational concerns - for example even on serverless this kind of information would be relevant to users
Any thoughts @ruflin @dakrone @felixbarny ?
Pinging @elastic/es-data-management (Team:Data Management)
My ideal scenario would be that Elasticsearch versions its assets like ingest pipeline, index templates which would not only allow to track changes (also historically) but allow to roll back changes. As this will be a massive effort, we should start simpler.
Few constraints:
- A single template / ingest pipeline can affect many data streams
- Especially component templates are reused, changing one can affect all data streams
- Upgrade of Elasticsearch cluster is also a change that can affect things
- Rollover can have an affect, as this is when the templates apply
- Mappings / settings can also be changed directly on the data stream itself
The above sounds a lot like an audit log. How much of this is captured today in audit logs? Instead of having this per data stream, could we have a global data stream for it with all the changes. If we have all the changes, it would allow Kibana to "stich togehther" the different changes and show it where relevant. For example if logs@custom
change, the change would show up in all data streams have have rolled over since the change (which reminds me of https://github.com/elastic/elasticsearch/issues/75031).
To start simple, only admin users would have access to the full changelog.
Besides having the audit log, ideally also on the asset itself like ingest pipelines, the system would automatically update meta information around created
, last_changed
and changed_by
.
If we have all the changes, it would allow Kibana to "stich togehther" the different changes and show it where relevant
I can imagine this part getting complicated over time, but I agree that piping the audit log into a separate data stream seems like a good way to get started here.
The above sounds a lot like an audit log. How much of this is captured today in audit logs?
We already do have an audit log in ES, and it can be configured to emit request bodies, however, I would argue that its purpose is separate from the intended use-case for this. I think this new concept is more of a changelog
and less tied to security/auditing and the permissions granted for a particular API.
Instead of having this per data stream, could we have a global data stream for it with all the changes.
To me it makes more sense to have this be global also. It could likely go into its own data stream.
the system would automatically update meta information around
created
,last_changed
andchanged_by
.
We do have these for some of our configuration items (like ILM policies), and we can expand that list fairly easily. There's a little bit of a discussion around leaking username information in a changed_by
field, but we could have a separate discussion about that.
We do have these for some of our configuration items (like ILM policies), and we can expand that list fairly easily. There's a little bit of a discussion around leaking username information in a changed_by field, but we could have a separate discussion about that.
I could see this as a low hanging fruit to get started as it at least would allow us to indicate the most recent change, no history.
I could see this as a low hanging fruit to get started as it at least would allow us to indicate the most recent change, no history.
Agreed, this is a good starting point. Together with the rollover timestamp of individual datastreams this can probably go quite far in terms of providing visibility.
@dakrone
We do have these for some of our configuration items (like ILM policies), and we can expand that list fairly easily. There's a little bit of a discussion around leaking username information in a changed_by field, but we could have a separate discussion about that
Would it be worth it opening a separate more implementation focused issue around that?
Would it be worth it opening a separate more implementation focused issue around that?
Yes, that would be useful, to keep this one a bit more focused.
Split out this first step into https://github.com/elastic/elasticsearch/issues/108754