homeassistant-elasticsearch icon indicating copy to clipboard operation
homeassistant-elasticsearch copied to clipboard

Support indexing via data streams

Open legrego opened this issue 4 years ago • 11 comments

The information that we publish to Elasticsearch is a great fit for Data Streams.

We should investigate what it would take to add support for data streams.

Data Streams was introduced in version 7.9 (I believe), so we wouldn't be able to always use them unless we dropped support for older cluster versions, which I'm hesitant to do.

  • [ ] Move off of legacy index template APIs
  • [ ] Replace ILM settings with DLM: https://www.elastic.co/guide/en/elasticsearch/reference/master/data-stream-lifecycle.html
  • [ ] Migrate index alias to data stream: https://www.elastic.co/guide/en/elasticsearch/reference/master/indices-migrate-to-data-stream.html

legrego avatar Jan 14 '21 13:01 legrego

++ on going for data streams and directly jump on the new naming scheme: https://www.elastic.co/blog/an-introduction-to-the-elastic-data-stream-naming-scheme (yes, I'm biased). I guess this would all fall under metrics as the type and we should decide on the dataset name(s). Is it all a single dataset or should there be multiple different ones?

ruflin avatar Jan 20 '21 14:01 ruflin

@ruflin I think metrics makes sense for this component. At this point, I think a single dataset is all we'd need.

What do you think about any of these options?

  • metrics-hass-default
  • metrics-hass.events-default
  • metrics-hass-events

legrego avatar Feb 06 '21 15:02 legrego

++ on metrics. I like hass.events as it keeps room for other hass.* data. hass-events would be using events as the namespace which is unexpected.

One thing I'm stumbling over is hass prefix instead of home_assisstant or ha. I'm still new to Home Assistant and I just run it in a docker container without hass as the OS. I'm a bit confused here on what is what and what is the correct naming so I wonder if hass is the correct prefix for all ways home assistant is run?

ruflin avatar Feb 15 '21 09:02 ruflin

One thing I'm stumbling over is hass prefix instead of home_assisstant or ha. I'm still new to Home Assistant and I just run it in a docker container without hass as the OS. I'm a bit confused here on what is what and what is the correct naming so I wonder if hass is the correct prefix for all ways home assistant is run?

I picked hass arbitrarily in the past, so I'm open to renaming this. home_assistant seems like a decent name. So would this make the new proposal metrics-home_assistant.events-default?

legrego avatar Feb 22 '21 15:02 legrego

@legrego LGTM 👍

ruflin avatar Feb 24 '21 08:02 ruflin

I'm coming back to this as I just stumbled today over the alias setup etc. I wonder if we could introduce a new config (which becomes the default for new installations) but keep the old setups working? Instead of home assistant installing the templates manually, I would switch over to an integration package having the templates etc. inside (see https://github.com/ruflin/ruflin-integration-package for inspiration). All the elasticsearch integration would do is push the zip file for installation.

The default data stream would be metrics-home_assistant.events-default. Some of the things we ship could also be more similar to log events? On my end, I still need to dig deeper into the code to fully understand what is shipped from where.

@legrego WDYT about the high level approach above?

ruflin avatar Jul 18 '23 12:07 ruflin

I'm coming back to this as I just stumbled today over the alias setup etc. I wonder if we could introduce a new config (which becomes the default for new installations) but keep the old setups working? Instead of home assistant installing the templates manually, I would switch over to an integration package having the templates etc. inside (see https://github.com/ruflin/ruflin-integration-package for inspiration). All the elasticsearch integration would do is push the zip file for installation.

The default data stream would be metrics-home_assistant.events-default.

@ruflin I like this approach quite a bit. I'll see if I can find some time to play with the integration package concept and get a working POC.

Some of the things we ship could also be more similar to log events? On my end, I still need to dig deeper into the code to fully understand what is shipped from where.

I'd love to see if there's a way for us to hook into home assistant's logger. Tapping into that would give us a proper dataset for logs-home_assistant.???.default

legrego avatar Aug 12 '23 17:08 legrego

I would switch over to an integration package having the templates etc. inside (see https://github.com/ruflin/ruflin-integration-package for inspiration). All the elasticsearch integration would do is push the zip file for installation.

@ruflin Can you help me understand the benefits of this approach over installing the templates (etc.) manually? A couple I can see:

  • Take advantage of the package specification, which assists in validating correctness.
  • I can call one API instead of multiple, making version checks/upgrades easier down the line.

Are there others? I really don't love the idea of asking users for both their Kibana & ES endpoints in order to complete setup. You mentioned that we could streamline this a bit for Cloud, but that would add yet another flow (== more complexity).

Do you know if there are plans to expose an ES API for package installation in the future? That would make this approach much more paletable to me.

legrego avatar Sep 29 '23 20:09 legrego

The package installation takes care of all the edge cases, roll overs etc. and allows you to package additional assets like dashboards etc. into it. It also means you get things like ECS templates etc. directly out of the box. There is lots of small additional things that happen during installation to optimise things.

Do you know if there are plans to expose an ES API for package installation in the future? I would love to have one but I doubt it will happen any time soon.

asking users for both their Kibana & ES endpoints

I hear you. I wonder if we could turn it around. Only ask for Kibana endpoint and then ask Kibana for the ES endpoint assuming this is possible?

ruflin avatar Oct 03 '23 06:10 ruflin

The package installation takes care of all the edge cases, roll overs etc. and allows you to package additional assets like dashboards etc. into it. It also means you get things like ECS templates etc. directly out of the box. There is lots of small additional things that happen during installation to optimise things.

Thanks!

I hear you. I wonder if we could turn it around. Only ask for Kibana endpoint and then ask Kibana for the ES endpoint assuming this is possible?

I was wondering this as well. I don't think there's a way to reliably get this information from Kibana

legrego avatar Oct 04 '23 15:10 legrego

I was wondering this as well. I don't think there's a way to reliably get this information from Kibana

We could add it ;-)

ruflin avatar Oct 05 '23 11:10 ruflin