homeassistant-elasticsearch icon indicating copy to clipboard operation
homeassistant-elasticsearch copied to clipboard

Publish documents with domain-specific mappings

Open legrego opened this issue 4 years ago • 6 comments
trafficstars

Currently, all of the events that we publish share a hass.attributes property, which acts as a dumping ground for any and all attributes that are associated with the entity.

This mostly works, but we start running into trouble when entities from different domains publish an attribute with the same name, but with different data types.

For example, consider the two entities below. One with the domain of foo, and the other with a domain of bar. Both publish a last_updated attribute, but foo treats this as a proper timestamp, while bar treats this as a human readable relative time:

{
  "domain": "foo",
  "entity_id": 123,
  "attributes": {
    "last_updated": "2020-11-25T14:53:00"
  }
}
{
  "domain": "bar",
  "entity_id": 456,
  "attributes": {
    "last_updated": "moments ago"
  }
}

The common field type between these two would end up being text, since we can't treat these relative times as timestamps. This diminishes the value of the field, as it limits what you can do in terms of queries and aggregations.

I think we should explore moving the attributes from hass.attributes.* to hass.attributes.{DOMAIN}.*, so that we can have more explicit mappings for each domain. This won't solve the problem entirely, but it would make it more manageable. It won't solve the problem entirely because different integrations can provide entities within the same domain, and attributes are not strictly typed within a domain. (e.g. a sensor can come from many places: mqtt, http, zwave, etc).

Using the above example, this proposal would change the mappings to:

{
  "domain": "foo",
  "entity_id": 123,
  "attributes": {
	"foo": {
		"last_updated": "2020-11-25T14:53:00"
    }
  }
}
{
  "domain": "bar",
  "entity_id": 456,
  "attributes": {
    "bar": {
		"last_updated": "moments ago"
    }
  }
}

This is a rather aggressive change, so I'm interested in hearing feedback about this proposal before taking it on. Feel free to propose alternate mapping strategies as well.

legrego avatar Nov 25 '20 20:11 legrego

The only way I can think of to remove the problem completely is to use hass.attributes.{DOMAIN}.{ENTITY_ID}.* What do you think?

dsztykman avatar Dec 21 '20 09:12 dsztykman

The only way I can think of to remove the problem completely is to use hass.attributes.{DOMAIN}.{ENTITY_ID}.*

++ I think this is close. entity_id is technically the concatenation of domain and object_id, so {DOMAIN.{ENTITY_ID} would be redundant. What do you think about hass.{ENTITY_ID}.attributes.*? This would signify that the attributes belong to the entity, instead of the entity belonging to attributes.

legrego avatar Jan 05 '21 01:01 legrego

Sounds good to me, I think it makes sense to have attributes belonging to the entity. In general it means we'll never be able to share visualisation because we have different entity_id, but I don't see any other solution...

dsztykman avatar Jan 05 '21 08:01 dsztykman

In general it means we'll never be able to share visualisation because we have different entity_id, but I don't see any other solution...

Yeah which is sad, but that should be easy enough to tweak on the exported visualization if necessary.

The other complication would be if you wanted to graph multiple sensors of the same type without having to worry about their entity ids. Maybe a graph of temperature across multiple sensors across the home. The hass.value property would still be fixed in place, but if there were attributes that you wanted to include, then those would be within the {ENTITY_ID}.attributes.* object

legrego avatar Jan 05 '21 12:01 legrego

Could we leverage runtime fields (available in upcoming releases) for this? Instead of defining mappings for attributes in advance, we could instead store them at hass.attributes.* as they are today, but without creating mappings for the individual fields at index time:

{
   "hass": {
      "type": "object",
      "properties": {
        "attributes": {
          "type": "object",
          "enabled": false
        }
    }
}

If this works the way I think it will, this would give consumers the flexibility to query the fields as they see fit, without this component having to dictate a specific mapping at index time.

legrego avatar Jan 18 '21 20:01 legrego

I'm now thinking that mapping hass.attributes as flattened would make the most sense.

legrego avatar Sep 29 '23 19:09 legrego

Another option here would be to put each domain into its own Index / Datastream. This would allow each domain to use its domain-specific mapping while allowing users to define their own Kibana Data Views that either try to work with all domains (and thus are at the mercy of conflicts) and/or specific domains.

This could be a change that follows the introduction of data streams and would result in: metrics-homeassistant.events.bar-default metrics-homeassistant.events.foo-default

strawgate avatar Mar 05 '24 01:03 strawgate

Another option here would be to put each domain into its own Index / Datastream. This would allow each domain to use its domain-specific mapping while allowing users to define their own Kibana Data Views that either try to work with all domains (and thus are at the mercy of conflicts) and/or specific domains.

Can you compare/contrast this approach to transitioning hass.attributes to a flattened field type? Creating a data stream for each domain seems quite a bit more complex, and I want to make sure we'd get enough benefit from the additional complexity.

One of the use cases of this component is long-term analysis of sensor data, so ideally whatever approach we take will come with a migration to allow users to continue this long-term analysis without having to manually re-index or drop their historical data.

legrego avatar Mar 05 '24 12:03 legrego

Can you compare/contrast this approach to transitioning hass.attributes to a flattened field type? Creating a data stream for each domain seems quite a bit more complex, and I want to make sure we'd get enough benefit from the additional complexity.

I believe the big downside of the flattened type is that it lacks support in Kibana https://github.com/elastic/kibana/issues/25820 but I'm not totally sure

As for the complexity, The datastream per domain is how other data sources do it -- for example the system integration metrics-system.network, metrics-system.cpu, metrics-system.memory and thanks to the modern index templates, we would just set the pattern on the template to "metrics-homeassistant.*.events-default*" and on indexing:

            desination_data_stream = self.datastream_prefix + "." + state.domain + "-" + self.datastream_suffix
            return {
                "_op_type": "create",
                "_index": desination_data_stream,
                "_source": document,
            }

One of the use cases of this component is long-term analysis of sensor data, so ideally whatever approach we take will come with a migration to allow users to continue this long-term analysis without having to manually re-index or drop their historical data.

I believe that domain-specific datastreams will not impact this as the user can simply make a dataview across all the datastreams with metrics-homeassistant.* and if they want compatibility with their old indices they should be able to just set the dataview to metrics-homeassistant.*,hass-events* which should combine both datasets into a single view. I will add a task to test this in my PR

strawgate avatar Mar 05 '24 12:03 strawgate

@strawgate Thanks, I wasn't aware that Kibana lacked support for the flattened type. I also didn't know that other data sources already work this way. Given that, I'm happy to give this approach a go.

legrego avatar Mar 05 '24 13:03 legrego