heartbeat icon indicating copy to clipboard operation
heartbeat copied to clipboard

Undiagnosed rare bug will set `heartbeatStats.LongestMissingBeat` to a much higher number than is accurate

Open 5HT2 opened this issue 3 years ago • 10 comments

Note that this bug does not affect uptime (nor device.LongestMissingBeat). This is only an issue with where time.Now() is used, I think.

5HT2 avatar Apr 03 '22 06:04 5HT2

Bumping priority, I can't tell why but this issue seemingly still happens after rebooting heartbeat. It appears to always set around ~447 hours. I don't know if this number has any significance (447 - 12 (my initial hours) = 435 hours).

This seems to come back even after the system time has been set and heartbeat has been restarted.

5HT2 avatar Apr 09 '22 00:04 5HT2

Oh, it's ALWAYS 1610833 seconds

5HT2 avatar Apr 09 '22 00:04 5HT2

I don't think it's caused by malformed data as my devices look fine image

5HT2 avatar Apr 09 '22 00:04 5HT2

I suspect this might have just been caused by erroneous data in the DB, I fixed it manually and will monitor for further changes.

5HT2 avatar Apr 09 '22 02:04 5HT2

Can't reproduce. This issue has not happened since April 8th.

5HT2 avatar Dec 18 '22 10:12 5HT2

Reopening this as some sort of weird Redis behavior. Here's the appendonly.aof from the last time this happened on production, which is fairly recently:

{"last_beat_formatted":"2 minutes, 4 seconds","total_devices_formatted":"3","total_visits_formatted":"903,733","total_uptime_formatted":"11,773 hours, 27 minutes, 8 seconds"{"device_name":"mango","timestamp":1673974698}
*4
$14
JSON.ARRAPPEND
$5
beats
$1
.
$46
{"device_name":"mango","timestamp":1673974698}
*4
$8
JSON.SET
$5
stats
$1
.
$303
{"last_beat_formatted":"30 seconds","total_devices_formatted":"3","total_visits_formatted":"900,588","total_uptime_formatted":"11,599 hours, 47 minutes, 8 seconds","total_beats_formatted":"376,612","total_visits":900588,"total_uptime_milli":41759228479,"total_beats":376612,"longest_missing_beat":63127}
*4
$8
JSON.SET
$7
devices
$1
.
$399
[{"device_name":"lychee","last_beat":{"device_name":"lychee","timestamp":1673928714},"total_beats":4951,"longest_missing_beat":219481},{"device_name":"mango","last_beat":{"device_name":"mango","timestamp":1673974698},"total_beats":15873,"longest_missing_beat":40847},{"device_name":"blueberry","last_beat":{"device_name":"blueberry","timestamp":1671482520},"total_beats":1,"longest_missing_beat":0}]
*4
$8
JSON.SET
$9
last_beat
$1
.
$46
{"device_name":"mango","timestamp":1673974758}
*4
$14

5HT2 avatar Jan 24 '23 22:01 5HT2

This time it is 692 hours (exactly 2492358 seconds in the DB). Unrelated to system timezone updates, I believe this issue might either be correlated to restarting Docker or adding a new device??? idfk I will solve this later, it happens so rarely that switching to the Rust rewrite might just be the way to go

5HT2 avatar Jan 24 '23 22:01 5HT2

Happened again. 1681153 seconds, or 466 hours. This HAS to be fixed ASAP because of canary / dead man switch reasons.

{"last_beat_formatted":"11 seconds","total_devices_formatted":"3","total_visits_formatted":"912,240","total_uptime_formatted":"12,354 hours, 39 minutes, 8 seconds","total_beats_formatted":"397,665","total_visits":912240,"total_uptime_milli":44476748495,"total_beats":397665,"longest_missing_beat":63127}
*4
$8
JSON.SET
$7
devices
$1
.
$407
[{"device_name":"lychee","last_beat":{"device_name":"lychee","timestamp":1676693440},"total_beats":10839,"longest_missing_beat":219481},{"device_name":"mango","last_beat":{"device_name":"mango","timestamp":1676693414},"total_beats":31029,"longest_missing_beat":47177},{"device_name":"blueberry","last_beat":{"device_name":"blueberry","timestamp":1675012321},"total_beats":10,"longest_missing_beat":2499120}]
*4
$8
JSON.SET
$9
last_beat
$1
.
$46
{"device_name":"mango","timestamp":1676693474}
*4
$14
JSON.ARRAPPEND
$5
beats
$1
.
$46
{"device_name":"mango","timestamp":1676693474}
*4
$8
JSON.SET
$9
last_beat
$1
.
$47
{"device_name":"lychee","timestamp":1676693500}
*4
$14
JSON.ARRAPPEND
$5
beats
$1
.
$47
{"device_name":"lychee","timestamp":1676693500}
*4
$8
JSON.SET
$5
stats
$1
.
$305
{"last_beat_formatted":"11 seconds","total_devices_formatted":"3","total_visits_formatted":"912,241","total_uptime_formatted":"12,354 hours, 40 minutes, 8 seconds","total_beats_formatted":"397,667","total_visits":912241,"total_uptime_milli":44476808495,"total_beats":397667,"longest_missing_beat":1681153}

5HT2 avatar Feb 19 '23 00:02 5HT2

For future reference, if this happens again before I fix it, this is handy.

curl -X POST -H "Auth: $HEARTBEAT_AUTH" -i https://frog.icu/api/update/stats -d "$(curl -s https://frog.icu/api/stats | jq '.longest_missing_beat = 63127')"

5HT2 avatar Feb 19 '23 00:02 5HT2

This is happening more frequently, and I need to fix it.

{"last_beat_formatted":"28 seconds","total_devices_formatted":"3","total_visits_formatted":"919,081","total_uptime_formatted":"12,806 hours, 35 minutes, 8 seconds","total_beats_formatted":"412,428","total_visits":919081,"total_uptime_milli":46103708499,"total_beats":412428,"longest_missing_beat":1618410}

5HT2 avatar Mar 14 '23 00:03 5HT2