core Fix Bayesian sensor to use negative observations

Breaking change

Bayesian sensor has been updated:

prob_given_false is now a required configuration variable as there was no mathematical rationale for the previous default value.
numeric_state, template, and state entries with only one to_state configured will also now update the prior probability accordingly if the observation is false. Those who have used duplicate, mirrored state configurations as a work-around will have their functionality preserved. However for numeric_state and template entries this will cause duplication of the Bayesian updating.

Proposed change

This change will be breaking, the previous iterations of this integration only used an observation to update the prior if the observation evaluated to True and ignored it if the observation was False or None/Unavailable.

Many users may have tweaked their probabilities to get the desired behaviour, or made duplicate, mirrored, entries (one for each binary sensor state - the advocated work-around in this community post) in their bayesian config.yml in order to get the desired behaviour. This fix will ~likely change the behaviour of their bayesian sensors - by duplicating the effect of each observation.~ will preserve these configurations in commit 0b366b5ddca5ed2485232f25bc51d5dc224cee36 for 'state' entries but may break configurations for 'numeric_state' and 'template' entries, to fix simply remove the mirrored entry.

According to the documentation page this integration is only used by approx 600 active users.

Fixes #67478 The Bayesian sensor purports to use bayes' rule to combine sensor information in order to create a composite, probabilistic, sensor. However as currently configured it does so in a one-sided fashion that is mathematically and practically incorrect.

A worded example:

I want to know if my internet is likely to be working for the devices connected to my wifi. I have a variety of HA sensors that are correlated; can I ping google.com, can I ping 1.1.1.1, can I ping my local DNS servers, can I ping my local router. Any one of these devices can, in varying regularity, fail to respond to pings, however if a number at the same time fail to respond then the issue is probably my connection and not the devices/services.

As currently configured Bayesian 'observations' only affect the probability if the sensor returns the configured to_state state and ignores any other state. However, according to bayes' rule, if I CAN connect to 1.1.1.1, that tells me something about my connection and if I CANNOT that also tells me something - both scenarios should 'update my priors'. If the sensor in question fails or returns unavailable - then fine, ignore it.

Click to expand - Current functionality: One-sided bayesian updating

binary_sensor:
  - platform: ping
    name: Internet_Connection_Google
    host: 8.8.8.8
    count: 3
    scan_interval: 30
  - platform: ping
    name: Internet_Connection_Cloudflared
    host: 1.1.1.1
    count: 3
    scan_interval: 30
  - platform: ping
    name: Wireguard
    host: 10.10.0.1
    count: 3
    scan_interval: 30
  - platform: ping
    name: router
    host: 192.168.0.1
    count: 3
    scan_interval: 30
  - platform: ping
    name: pihole
    host: 192.168.0.28
    count: 3
    scan_interval: 30
    # bayesian sensore
  - platform: "bayesian"
    name: "internet_working" # maybe sensible to switch these in seperate errors - DNS, Switch, Gl-inet
    prior: 0.99 # internet has high up time
    probability_threshold: 0.95 # by defualt assume the internet is up
    observations:
    - platform: "state"
      entity_id: "binary_sensor.internet_connection_cloudflared"
      prob_given_true: 0.997 # cloudflare is rarely unavailible when the internet is up
      prob_given_false: 0.01 # it is highly likely that cloudflare will be unreachable when the internet is down - unless it is a dns issue
      to_state: "on"
    - platform: "state"
      entity_id: "binary_sensor.internet_connection_google"
      prob_given_true: 0.994
      prob_given_false: 0.01
      to_state: "on"
    - platform: "state"
      entity_id: "binary_sensor.router"
      prob_given_true: 0.995
      prob_given_false: 0.4 # slight majority of time internet is down router will be as well
      to_state: "on"
    - platform: "template" # evaluates to true when both the greeni and wireguard are down
      prob_given_true: 0.0001 # if the internet is working then it is very unlikely both are down
      prob_given_false: 0.10 # if the internet is not working then there is 10% chance they are both unreachable
      value_template: >-
        {% if is_state('binary_sensor.pihole', 'off')
           and is_state('binary_sensor.wireguard', 'off') %}
           true
        {% else %}
           false
        {% endif %}

Creates this when pings are returned successfully:

And this when I pull out my WAN cable - note how the observations 'disappear':

Click to expand - PR functionality: Two-sided bayesian updating

Same yml as above

binary_sensor:
  - platform: ping
    name: Internet_Connection_Google
    host: 8.8.8.8
    count: 3
    scan_interval: 30
  - platform: ping
    name: Internet_Connection_Cloudflared
    host: 1.1.1.1
    count: 3
    scan_interval: 30
  - platform: ping
    name: Wireguard
    host: 10.10.0.1
    count: 3
    scan_interval: 30
  - platform: ping
    name: router
    host: 192.168.0.1
    count: 3
    scan_interval: 30
  - platform: ping
    name: pihole
    host: 192.168.0.28
    count: 3
    scan_interval: 30
    # bayesian sensore
  - platform: "bayesian"
    name: "internet_working" # maybe sensible to switch these in seperate errors - DNS, Switch, Gl-inet
    prior: 0.99 # internet has high up time
    probability_threshold: 0.95 # by defualt assume the internet is up
    observations:
    - platform: "state"
      entity_id: "binary_sensor.internet_connection_cloudflared"
      prob_given_true: 0.997 # cloudflare is rarely unavailible when the internet is up
      prob_given_false: 0.01 # it is highly likely that cloudflare will be unreachable when the internet is down - unless it is a dns issue
      to_state: "on"
    - platform: "state"
      entity_id: "binary_sensor.internet_connection_google"
      prob_given_true: 0.994
      prob_given_false: 0.01
      to_state: "on"
    - platform: "state"
      entity_id: "binary_sensor.router"
      prob_given_true: 0.995
      prob_given_false: 0.4 # slight majority of time internet is down router will be as well
      to_state: "on"
    - platform: "template" # evaluates to true when both the greeni and wireguard are down
      prob_given_true: 0.0001 # if the internet is working then it is very unlikely both are down
      prob_given_false: 0.10 # if the internet is not working then there is 10% chance they are both unreachable
      value_template: >-
        {% if is_state('binary_sensor.pihole', 'off')
           and is_state('binary_sensor.wireguard', 'off') %}
           true
        {% else %}
           false
        {% endif %}

Creates this when pings are returned successfully:

And this when not: (Note how the state change is detected and the probabilities update appropriately)

Type of change

[ ] Dependency upgrade
[x] Bugfix (non-breaking change which fixes an issue)
[ ] New integration (thank you!)
[ ] New feature (which adds functionality to an existing integration)
[x] Breaking change (fix/feature causing existing functionality to break)
[ ] Code quality improvements to existing code or addition of tests

Additional information

This PR fixes or closes issue: fixes #67478
This PR is related to issue:
Link to documentation pull request: https://github.com/home-assistant/home-assistant.io/pull/21899

Checklist

[x] The code change is tested and works locally.
[x] Local tests pass. Your PR cannot be merged unless tests pass
[x] There is no commented out code in this PR.
[x] I have followed the development checklist
[x] The code has been formatted using Black (black --fast homeassistant tests)
[x] ~Tests have been added to verify that the new code works.~ No new tests required

If user exposed functionality or configuration variables are added/changed:

[x] ~Documentation added/updated for www.home-assistant.io~ None changed

If the code communicates with devices, web services, or third-party tools:

[x] The manifest file has all fields filled out correctly.
Updated and included derived files by running: python3 -m script.hassfest.
[x] ~New or updated dependencies have been added to requirements_all.txt.
Updated by running python3 -m script.gen_requirements_all.~ No new deps
[x] ~For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.~ No change to deps
[x] ~Untested files have been added to .coveragerc.~ No new files

The integration reached or maintains the following Integration Quality Scale:

[x] No score or internal

To help with the load of incoming pull requests:

[x] I have reviewed two other open pull requests in this repository.

Mar 04 '22 12:03 HarvsG

In c03552b and e547429 I choose not to completely ignore unavailable entries as it could frustrate debugging. Now they will be displayed as such

  - platform: mqtt
    state_topic: "does/state/notexist"
    name: "notexist"
    qos: 1
    payload_on: "ON"
    payload_off: "OFF"
    device_class: motion
  - platform: "bayesian"
    name: "internet_working" # maybe sensible to switch these in seperate errors - DNS, Switch, Gl-inet
    prior: 0.99 # internet has high up time
    probability_threshold: 0.95 # by defualt assume the internet is up
    observations:
    - platform: "state"
      entity_id: "binary_sensor.notexist"
      prob_given_true: 0.000000001 # cloudflare is rarely unavailible when the internet is up
      prob_given_false: 0.99999999 # it is highly likely that cloudflare will be unreachable when the internet is down - unless it is a dns issue
      to_state: "on"
    - platform: "state"
      entity_id: "binary_sensor.internet_connection_cloudflared"
...

Will show the unobserved in dev tools now

And put info in the log:

Mar 04 '22 13:03 HarvsG

I'm not sure what the policy is for writing code to aid migration/mitigate breaking changes. This would be one such way to do so.

I assume many people used duplicate, mirrored entries to get around this bug. It is advocated for in this post https://community.home-assistant.io/t/how-bayes-sensors-work-from-a-statistics-professor-with-working-google-sheets/143177

    def _is_mirrored_entry(self, value, key):
        possible_dup = []
        for entry in self._observations:
            if entry.get(key, None) is not None:
                if entry[key] == value:
                    possible_dup.append(entry)
        if len(possible_dup) != 2:
            return False
        true_sum = possible_dup[0]["prob_given_true"] + possible_dup[1]["prob_given_true"]
        if true_sum > 0.95 and true_sum <1.05:
            _LOGGER.warning(
                "Two entries found for '%s' and their prob_given_true sum to approx 1, duplicate entries for inverse states is no long required",
                value,
                )
            return True
        return False

Calling the function as such

    def _build_observations_by_entity(self):
        """
        Build and return data structure of the form below.

        {
            "sensor.sensor1": [{"id": 0, ...}, {"id": 1, ...}],
            "sensor.sensor2": [{"id": 2, ...}],
            ...
        }

        Each "observation" must be recognized uniquely, and it should be possible
        for all relevant observations to be looked up via their `entity_id`.
        """

        observations_by_entity = {}
        seen = set()
        for ind, obs in enumerate(self._observations):
            if obs.get("entity_id", None) in seen:
                print("checking for mirrored entry")
                if self._is_mirrored_entry(obs["entity_id"], "entity_id"):
                    continue
            elif obs.get("entity_id", None) is not None:
                seen.add(obs["entity_id"])

            obs["id"] = ind

This detects and ignores mirrors using some fuzzy logic, it would make the change less breaking but it is not pretty

Mar 04 '22 18:03 HarvsG

@frenck I see the smash label - not sure if I should feel fear or excitement

Some builds failing - is that in issue at my end or the CI stack, looks as if it is not a problem with my code?

Run echo "Failed to restore Python virtual environment from cache"
  echo "Failed to restore Python virtual environment from cache"
  exit 1
  shell: sh -e {0}
  env:
    CACHE_VERSION: 9
    PIP_CACHE_VERSION: 3
    HA_SHORT_VERSION: [2](https://github.com/home-assistant/core/runs/5643792261?check_suite_focus=true#step:5:2)022.4
    DEFAULT_PYTHON: [3](https://github.com/home-assistant/core/runs/5643792261?check_suite_focus=true#step:5:3).[9](https://github.com/home-assistant/core/runs/5643792261?check_suite_focus=true#step:5:9)
    PRE_COMMIT_CACHE: ~/.cache/pre-commit
    PIP_CACHE: /tmp/pip-cache
    SQLALCHEMY_WARN_20: 1
    PYTHONASYNCIODEBUG: 1
Failed to restore Python virtual environment from cache
Error: Process completed with exit code 1.

Mar 30 '22 09:03 HarvsG

see the smash label - not sure if I should feel fear or excitement

Neither :)

Mar 30 '22 12:03 frenck

@HarvsG I've restarted CI now. You can also trigger it yourself by closing, then reopening the PR.

Mar 30 '22 18:03 emontnemery

0b366b5 Makes changes no longer breaking. If a user defines multiple entries for a single entity, the previous behaviour is preserved it will only affect the posterior probability if the 'state' matches the 'to_state', this gives the added advantage that users can use entities that have more than two states by specifying probabilities for each state

Apr 01 '22 15:04 HarvsG

All tests now passing.

N.B: In order to make the target probabilities for the tests I used online, independent, Bayes Theorem calculators rather than just plugging the output back in as the solution.

Apr 01 '22 18:04 HarvsG

@frenck Could you kindly move the PR to 'needs review' or 'review in progress' on the projects board please. I closed and re-opened as per @emontnemery 's suggestion to force a re-run of tests. https://github.com/home-assistant/core/projects/4#card-78597739

Apr 04 '22 13:04 HarvsG

Tagging reviewers @dgomes and @emontnemery

Apr 19 '22 10:04 HarvsG

Bumping as has been another week @dgomes @emontnemery @frenck

Apr 28 '22 17:04 HarvsG

Bumping as has been another 3 weeks @dgomes @emontnemery @frenck

May 23 '22 18:05 HarvsG

@HarvsG Will you have time to adjust the tests so there's full coverage anytime soon?

Jun 29 '22 07:06 emontnemery

@HarvsG need help finishing this PR ?

Aug 30 '22 17:08 dgomes

@HarvsG need help finishing this PR ?

~Yup just the changes to the tests. I have not done much work on tests before and a bit pressed for time at the moment~

Now has 100% Code coverage

tagging @dgomes and @emontnemery for re-review.

Sep 05 '22 10:09 HarvsG

@dgomes I have done all your suggestions. I have tried adding myself as a codeowner but pre-commit checks disallow it:

hassfest.................................................................Failed
- hook id: hassfest
- files were modified by this hook

Sep 05 '22 15:09 HarvsG

Sorry to have lead you in the wrong direction, you add yourself through the manifest file

Sep 05 '22 15:09 dgomes

Sorry to have lead you in the wrong direction, you add yourself through the manifest file

Turns out I had to ad the modified CODEOWNERS file as well as the committed manifest

Edit: @dgomes, all done now, could you start the workflow?

Sep 05 '22 15:09 HarvsG

@dgomes Are you happy I have resolved your suggestions and queries? Build ~should now pass~ now passes as well.

Sep 07 '22 19:09 HarvsG

Great. I notice another reviewer is listed but I don't know if they are required. So apologies for (possibly) requesting a review unnecessarily.

Sep 09 '22 18:09 HarvsG

Anything else I need to do before merge?

Sep 21 '22 09:09 HarvsG

We are missing emontnemery approval

This might take sometime, but don't worry

Sep 21 '22 09:09 dgomes

I trust @dgomes on this one 👍

Sep 26 '22 02:09 balloob

Thanks for the merge. And thank you all for your time on this one. My first contrib to core 😄

Sep 26 '22 09:09 HarvsG

It was a good one :) Congratulations!

Now you must stick around for part 2: handling user submitted issues 😅

Sep 26 '22 10:09 dgomes