core icon indicating copy to clipboard operation
core copied to clipboard

Add as_mutable for Jinja templating

Open depoll opened this issue 2 years ago • 5 comments

Breaking change

Proposed change

Breaking out #88575

Here, I've added the ability to create a mutable copy of a list or dictionary in jinja that you can directly modify (rather than rebuilding the list/dictionary element-by-element, which can turn an O(1) operation into an O(n) or O(n^2) operation). This requires two additions:

  • as_mutable filter so that one can efficiently mutate a list or dictionary without rebuilding it from scratch in a loop
  • Enabling the 'Expression Statement' do extension to make it possible to mutate lists/dicts.

All of this enables the following code:

{% set mutableDict = {} | as_mutable %}
{% do mutableDict.update({'foo': 'bar'}) %}
{% set mutableList = [] | as_mutable %}
{% do mutableList.append("baz") %}

{{ mutableDict }}
{{ mutableList }}

To produce the following output:

{'foo': 'bar'}
['baz']

I'm happy to update the documentation as well but wanted to first clear that this would be a welcome change.

Type of change

  • [] Dependency upgrade
  • [ ] Bugfix (non-breaking change which fixes an issue)
  • [ ] New integration (thank you!)
  • [x] New feature (which adds functionality to an existing integration)
  • [ ] Deprecation (breaking change to happen in the future)
  • [ ] Breaking change (fix/feature causing existing functionality to break)
  • [ ] Code quality improvements to existing code or addition of tests

Additional information

  • This PR fixes or closes issue: fixes #
  • This PR is related to issue:
  • Link to documentation pull request: https://github.com/home-assistant/home-assistant.io/pull/26437

Checklist

  • [x] The code change is tested and works locally.
  • [x] Local tests pass. Your PR cannot be merged unless tests pass
  • [x] There is no commented out code in this PR.
  • [x] I have followed the development checklist
  • [x] The code has been formatted using Black (black --fast homeassistant tests)
  • [x] Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

If the code communicates with devices, web services, or third-party tools:

  • [ ] The manifest file has all fields filled out correctly.
    Updated and included derived files by running: python3 -m script.hassfest.
  • [ ] New or updated dependencies have been added to requirements_all.txt.
    Updated by running python3 -m script.gen_requirements_all.
  • [ ] For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.
  • [ ] Untested files have been added to .coveragerc.

To help with the load of incoming pull requests:

depoll avatar Feb 22 '23 18:02 depoll

Added docs.

depoll avatar Mar 03 '23 03:03 depoll

Happy to keep iterating. A few thoughts:

  • I considered copy-on-write, but I think it gets really confusing really quickly. Suppose I make a change to a nested dictionary that resulted from some other API -- either the containing dictionary needs to also be copied (even though we're not writing to it directly) or the next time I access that element, the changes will be lost. Then, if we do transparently copy, under what circumstances do those changes get reused. E.g. if I modify states.light.foo.attributes.some_dictionary_attribute, does the change stick if I access it again? What if I use state_attr() to get the same value? And change the result of that? IMO, there is a reasonable case to be made for every combination of answers to those questions, but it makes the APIs very difficult to predict.
  • Unfortunately, without as_mutable, script variables will also have this problem, since you might have a variable that's a dictionary or list, and that won't be mutable by default. If you mutate them in-place (e.g. using copy-on-write), that won't get written back to the script variable, which may confuse folks as well.
  • Totally open to letting the {} and [] literals be automatically mutable if I can figure out how. That at least gets rid of the {} | as_mutable thing. On the other hand, as_mutable is still useful for dealing with dictionaries that come back from other APIs. In principle you could probably do {}.update(immutable_dict), but I'm not sure that's a better experience.

So -- that's how I landed on being explicit about making things mutable -- everywhere else it quickly gets confusing and difficult to predict which things will be automatically mutable and what changes may stick. I do like the idea of making user-defined lists and dicts automatically mutable.

depoll avatar Mar 13 '23 17:03 depoll

@emontnemery Do you have any idea how to override the {} and [] behaviors in jinja? I'm having trouble finding a good hook, but will keep looking. Since they're not operators AFAICT (they're a literal syntax built into the language), I'm not sure how to adjust it.

depoll avatar Mar 13 '23 17:03 depoll

After a little digging, I couldn't find any way to adjust the behavior of literal creation (I think it's just compiling down to raw Python literals, which also don't look overridable). Happy to keep going back and forth -- marking as Ready for review so it ends up back on your radar, but if that's the wrong thing to do since I haven't actually changed anything, apologies in advance :)

depoll avatar Mar 13 '23 17:03 depoll

@emontnemery no rush, just thinking it would be nice to get this in with all the other Jinja changes I made in this cycle. Any thoughts on what I mentioned above? Seems easier to follow a world where everything is immutable unless explicitly marked otherwise then one where it's unclear where changes will persist, but definitely interested in any ideas you have.

depoll avatar Mar 20 '23 06:03 depoll

@depoll thanks for looking into the copy on modify as well as overriding operators 👍

The concern with this PR is still that as_mutable will be difficult to find and understand, so it will end up not being used a lot. Can you give a few realistic examples of use cases where as_mutable is used, as well as the same functionality but without having access to as_mutable to better illustrate how it's helpful?

emontnemery avatar Apr 04 '23 06:04 emontnemery

Here's an example straight out of my setup. I have a template sensor that I use to aggregate BLE info I'm gathering on a bunch of ESPHome devices (the built in presence tools don't work very well for my use cases). It receives events and updates its state:

ble:
  template:
    - trigger:
        - platform: event
          event_type: esphome.beacon_detected
          id: detected
        - platform: time_pattern
          minutes: "*"
          id: cleanup
      sensor:
        name: BLE Beacons
        unique_id: ble_beacons
        state: "{{ this.attributes.beacons | length }}"
        attributes:
          beacons: >-
            {%- macro cleanup(beacons) -%}
              {%- set ns = namespace(beacons=[]) -%}
              {%- for b in beacons -%}
                {%- if b.last_seen | as_datetime >= now() - timedelta(seconds=10) -%}
                  {%- set ns.beacons = ns.beacons + [b] -%}
                {%- endif -%}
              {%- endfor -%}
              {{ ns.beacons }}
            {%- endmacro -%}
            {%- set old_beacons = this.attributes.beacons or [] -%}
            {%- set old_beacons = old_beacons | from_json if old_beacons is string else old_beacons -%}
            {%- if trigger.id == "detected" -%}
              {%- set event_data_id = [trigger.event.data.uuid | lower, trigger.event.data.major | int, trigger.event.data.minor | int] | join('-') -%}
              {%- set ns = namespace(sensor=None, canonical_id=event_data_id, sensors=[]) -%}
              {%- set ns.sensors = states.sensor | selectattr('state', 'in', ['Transmitting', 'Stopped']) | list -%}
              {%- for it in states.input_text | selectattr('state', 'match', 'ibeacon:') -%}
                {%- set ns.sensors = ns.sensors + [{
                  "entity_id": it.entity_id,
                  "attributes": {
                    "id": it.state['ibeacon:' | length:]
                  }
                }] -%}
              {%- endfor -%}
              {%- for s in ns.sensors -%}
                {%- set id_parts = (s.attributes.id | lower).split('_') -%}
                {%- set id = [id_parts[0], id_parts[1] | int, id_parts[2] | int] | join('-') -%}
                {%- if event_data_id == id -%}
                  {%- set ns.sensor = s -%}
                  {%- set ns.canonical_id = s.attributes.id -%}
                {%- endif -%}
              {%- endfor -%}
              
              {%- if ns.sensor is not none -%}
                {%- set cur_beacon = {
                  "id": ns.canonical_id,
                  "rssi": trigger.event.data.rssi,
                  "distance": trigger.event.data.distance,
                  "source": trigger.event.data.source,
                  "source_id": trigger.event.data.device_id,
                  "area": area_id(trigger.event.data.device_id),
                  "sensor": ns.sensor.entity_id,
                  "last_seen": now() | string
                } -%}
                {%- set ns = namespace(beacons=[], found=false) -%}
                {%- for b in old_beacons -%}
                  {%- if b.source == cur_beacon.source and b.id == cur_beacon.id -%}
                    {%- set ns.beacons = ns.beacons + [cur_beacon] -%}
                    {%- set ns.found = true -%}
                  {%- else -%}
                    {%- set ns.beacons = ns.beacons + [b] -%}
                  {%- endif -%}
                {%- endfor -%}
                {%- if not ns.found -%}
                  {%- set ns.beacons = ns.beacons + [cur_beacon] -%}
                {%- endif -%}
                {{ cleanup(ns.beacons) }}
              {%- else -%}
                {{ cleanup(old_beacons) }}
              {%- endif -%}
            {%- else -%}
              {{ cleanup(old_beacons) }}
            {%- endif -%}

as_mutable produces lots of opportunities for optimization here as on every event (which happen a lot and make up like 50% of all of my instance's CPU usage per the profiler) I have to rebuild lists.

But let's just take the last for loop as an example. Here, I'm doing crazy amounts of copying immutable lists to rebuild them item-by-item to potentially replace a single entry. This could instead be rewritten:

{%- set ns = namespace(beacons=old_beacons | as_mutable, found=false) -%}
{%- for b in old_beacons -%}
  {%- if b.source == cur_beacon.source and b.id == cur_beacon.id -%}
    {%- do ns.beacons.__setitem__(loop.index0, cur_beacon) -%}
    {%- set ns.found = true -%}
    {%- break -%}
  {%- endif -%}
{%- endfor -%}
{%- if not ns.found -%}
  {%- do ns.beacons.append(cur_beacon) -%}
{%- endif -%}

This turns an O(n^2) operation (because of repeated copying of lists) into an O(n) operation (because there is a single copy which is then modified in place).

I have a number of similar examples with dicts where e.g. removing an item requires rebuilding the dict item-by-item, making copies at every iteration, when a single pop() call would do it in O(1). An example includes keeping track of things coming and going (e.g. if I have enter/exit events for devices in a room and want to keep track of some data about who is there). as_mutable allows this:

{% set ns = namespace(new_dict={}) %}
{% for k, v in dict.items() %}
  {% if k != to_delete %}
    {% set ns.new_dict = dict(ns.new_dict, **{k: v}) %}
  {% endif %}
{% endfor %}

To become this:

{% set new_dict = dict | as_mutable %}
{% do new_dict.pop(to_delete) %}

depoll avatar Apr 05 '23 04:04 depoll

You should really write your beacon template as a custom integration in Python and don't use jinja2 for this.

balloob avatar Apr 13 '23 13:04 balloob

@balloob fair enough, though (at least for me) a custom integration is fairly heavyweight to get started with by comparison to a template sensor and vastly more painful to debug with live data (because every change requires restarting HA instead of just reloading templates).

Anyway, the use case exists in far smaller settings -- I have a number of bookkeeping cases where I'm getting an event and doing simple accumulation in a dict based on those events. Both adding and removing items are more painful and resource-intensive than they need to be.

depoll avatar Apr 13 '23 16:04 depoll

We've discussed this functionality extensively a couple of times with a couple of core developers, and we have decided not to accept this change/feature.

It is too confusing to use and explain, and the given use case is very specific (and shouldn't really be a template).

Nevertheless, thanks for being willing to contribute @depoll 👍

../Frenck

frenck avatar May 22 '23 17:05 frenck