core icon indicating copy to clipboard operation
core copied to clipboard

[Recorder, MariaDB?] StatisticsTask: float division by zero

Open jhansche opened this issue 1 year ago • 1 comments

The problem

Since migrating to MariaDB for the Recorder db, I've started seeing this error in logs. So far I've seen it twice, at 1:40am and 10am.

What version of Home Assistant Core has the issue?

2023.2.5

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant Supervised

Integration causing the issue

Recorder

Link to integration documentation on our website

https://www.home-assistant.io/integrations/recorder

Diagnostics information

No response

Example YAML snippet

recorder:
  db_url: mysql://...
  purge_keep_days: 30
  commit_interval: 20

Anything in the logs that might be useful for us?

Logger: homeassistant.components.recorder.core
Source: components/sensor/recorder.py:127
Integration: Recorder (documentation, issues)
First occurred: 10:00:10 AM (1 occurrences)
Last logged: 10:00:10 AM

Error while processing event StatisticsTask(start=datetime.datetime(2023, 2, 27, 14, 55, tzinfo=datetime.timezone.utc), fire_events=True): float division by zero
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/recorder/core.py", line 671, in _run_event_loop
    self._process_one_task_or_recover(task)
  File "/usr/src/homeassistant/homeassistant/components/recorder/core.py", line 684, in _process_one_task_or_recover
    return task.run(self)
  File "/usr/src/homeassistant/homeassistant/components/recorder/tasks.py", line 143, in run
    if statistics.compile_statistics(instance, self.start, self.fire_events):
  File "/usr/src/homeassistant/homeassistant/components/recorder/util.py", line 600, in wrapper
    return job(instance, *args, **kwargs)
  File "/usr/src/homeassistant/homeassistant/components/recorder/statistics.py", line 701, in compile_statistics
    compiled: PlatformCompiledStatistics = platform.compile_statistics(
  File "/usr/src/homeassistant/homeassistant/components/sensor/recorder.py", line 382, in compile_statistics
    compiled = _compile_statistics(hass, session, start, end)
  File "/usr/src/homeassistant/homeassistant/components/sensor/recorder.py", line 522, in _compile_statistics
    stat["mean"] = _time_weighted_average(fstates, start, end)
  File "/usr/src/homeassistant/homeassistant/components/sensor/recorder.py", line 127, in _time_weighted_average
    return accumulated / (end - start).total_seconds()
ZeroDivisionError: float division by zero


### Additional information

I've only seen one other reference to this error in [this community thread](https://community.home-assistant.io/t/i-have-a-problem-with-energy-dashboard-after-moving-to-mariadb/539228). Like that user, I started seeing it after migrating to MariaDB, which I only did last night, so I don't know if it worked in previous versions of Home Assistant.

jhansche avatar Feb 27 '23 17:02 jhansche

Hey there @home-assistant/core, mind taking a look at this issue as it has been labeled with an integration (recorder) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of recorder can trigger bot actions by commenting:

  • @home-assistant close Closes the issue.
  • @home-assistant rename Awesome new title Renames the issue.
  • @home-assistant reopen Reopen the issue.
  • @home-assistant unassign recorder Removes the current integration label and assignees on the issue, add the integration domain after the command.

(message by CodeOwnersMention)


recorder documentation recorder source (message by IssueLinks)

home-assistant[bot] avatar Feb 27 '23 17:02 home-assistant[bot]

I'm starting to believe this may have been caused by a botched mysql migration, but I do not have evidence yet to back that up. I was also noticing that the logbook and history seemed to be lumping dozens of state changes into a single timestamp, instead of displaying the correct time at which the states changed.

I used sqlite-to-mysql to import the schema+data from my db file into mariadb, but it seems that some of the schema does not get converted correctly. For example, event_type was imported as varchar(32) instead of varchar(64), which caused a different error in the logs. Other data types were missed as well (datetime vs datetime(6), float vs double, and so on), as well as the missing COLLATE utf8mb4_unicode_ci.

I've updated the schema of my tables to match what HA creates when encountering a fresh empty mysql database. Since making the changes, I'm no longer seeing the logbook events being lumped together into a single point in time.

The overlapping timestamps may also explain why (end - start) would be 0, leading to this division by zero error.

I think we can close this task as user error - though in the future it would be great to have:

  1. A system health check / repair item that can inspect the db schema and report if something appears to be incompatible, and possibly repair it in place.
  2. An easier process for migrating from sqlite to mysql (or postgres etc), especially when the user wants to use the MariaDB Add-on, since it's extremely difficult, tedious, and at times seems impossible to do the migration properly, because you have to have the schema created by Home Assistant for it to work correctly (the pypi tool does not create the schema correctly), but we also don't want to lose data, and the safest way to migrate the data is by closing the sqlite db file first, which usually means shutting down Home Assistant, but if you shut down HA then your MariaDB Add-on also goes down, and it's a long list of paradoxes. A simple migration wizard or script or whatever, especially an online migration (where the recorder continues working during the process), would make this whole process a breeze.

jhansche avatar Mar 01 '23 17:03 jhansche

Incorrect schema with float vs double seems like the likely cause here. There is already a stats schema health check that happens on startup but it can't catch every problem.

bdraco avatar Mar 02 '23 00:03 bdraco

Here is a full diff of the schema that I used to alter my incorrect schema. Most of the float -> double columns were the _ts named columns, so I agree that that was the most likely cause. schema.diff.txt

jhansche avatar Mar 02 '23 00:03 jhansche

Having the same issue, but it only has an effect every 25 hours - It causes me to have a blank hour in energy. Is there a fix for this? Also moved from sqlite to MariaDB

AlexLamond avatar Mar 02 '23 18:03 AlexLamond

It sounds like the most problematic columns to fix are the float timestamps, and possibly the datetime columns:

ALTER TABLE events MODIFY COLUMN time_fired_ts double DEFAULT NULL,
  MODIFY COLUMN time_fired datetime(6);
ALTER TABLE states MODIFY COLUMN last_changed_ts double DEFAULT NULL,
  MODIFY COLUMN last_updated_ts double DEFAULT_NULL,
  MODIFY COLUMN last_changed datetime(6),
  MODIFY COLUMN last_updated datetime(6);

There were other differences as well, but these should fix the timestamp problems.

jhansche avatar Mar 02 '23 19:03 jhansche

@jhansche Accoriding to your diff, here are all statements for setting the default schema. ha_default_schema.sql.txt

der-berni avatar Mar 07 '23 00:03 der-berni

https://github.com/home-assistant/core/blob/dev/homeassistant/components/recorder/migration.py#L552 I think this should be elif

Commit: https://github.com/home-assistant/core/commit/170a13302c8fdf2285615126bbf80b946a9a666b#diff-b13fb7a9fcbe98090c51f229ca8305a0690ae42a633346b177574ee43b3a31c8

dcoder42 avatar Mar 21 '23 11:03 dcoder42

https://github.com/home-assistant/core/blob/dev/homeassistant/components/recorder/migration.py#L552

I think this should be elif

Commit: https://github.com/home-assistant/core/commit/170a13302c8fdf2285615126bbf80b946a9a666b#diff-b13fb7a9fcbe98090c51f229ca8305a0690ae42a633346b177574ee43b3a31c8

Nicely spotted 👍 https://github.com/home-assistant/core/pull/90058

Unrelated to this issue though since that's only in dev

bdraco avatar Mar 21 '23 16:03 bdraco

Back to this issue. I am assuming statistics compile fails for the time block with the duplicate timestamp because they happen in the same time period and even after fixing the schema the problem will still happen because the data in the database is missing the microseconds

bdraco avatar Mar 21 '23 17:03 bdraco