GoHeishaMon icon indicating copy to clipboard operation
GoHeishaMon copied to clipboard

[Workaround] v1.0.192 crashes with error "out of time for read"

Open adnovea opened this issue 1 year ago • 3 comments

I have a CZ-TAW1B module and use the Ethernet link to monitor my Aquarea. It have tried different versions : v1.0.166, v1.1.191 and the last one v1.0.192. The two latter versions crashes the same way after a variable period of time (few minutes) with the same error message "out of time for read" :

71 C8 01 10 55 95 52 49 00 55 00 01 00 00 00 00 00 00 00 00 59 15 14 55 55 15 55 55 55 19 00 00 00 00 00 00 00 00 99 9C 85 80 B4 71 71 97 99 00 00 00 00 00 00 00 00 00 00 00 80 85 15 8A 85 85 D0 7B 78 1F 7E 1F 1F 79 79 8D 8D B7 A3 7B 8F B7 A3 7B 8F 98 85 80 8F 8A 94 9E 8F 8A 94 9E 85 8F 8A 11 3D 78 C1 0B 7E 7C 1F 7C 7E 00 00 00 55 55 55 21 73 15 55 05 09 11 65 00 00 00 00 00 00 00 00 C2 D3 0C 33 65 B2 D3 0B 94 65 95 00 00 8D 8B 8A 32 32 A5 AA 32 32 32 99 A5 8B 8B 96 8C 8B 61 8C 61 8E 38 01 01 01 00 00 22 00 01 01 01 01 79 79 01 01 43 02 00 2F 03 00 16 00 00 01 00 00 06 01 01 01 01 01 01 01 02 00 00 5A
Checksum and header received ok!
Total reads : 279.000000 and total good reads : 277.000000 (99.28 %)
received TOP50 Discharge_Temp: 11
Publikuje do  panasonic_heat_pump/main/Discharge_Temp warosc 11
read ma status true
sent bytes: 110 with checksum: 18
71 6C 01 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Received 203 bytes data

71 C8 01 10 55 95 52 49 00 55 00 01 00 00 00 00 00 00 00 00 59 15 14 55 55 15 55 55 55 19 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Checksum received false!
read ma status false
sent bytes: 110 with checksum: 18
71 6C 01 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
out of time for read :(
sent bytes: 110 with checksum: 18
71 6C 01 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
out of time for read :(
sent bytes: 110 with checksum: 18
71 6C 01 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Once the failure occurs, it repeats indefinitely.

Here is a ~5 hours chronology of the crashes:

Thu Nov  9 11:29:42 CET 2023 Start GoHeishaMon
Thu Nov  9 11:30:58 CET 2023 Crashed
Thu Nov  9 11:37:43 CET 2023 Crashed
Thu Nov  9 11:44:01 CET 2023 Crashed
Thu Nov  9 11:50:27 CET 2023 Crashed
Thu Nov  9 12:03:11 CET 2023 Crashed
Thu Nov  9 12:07:25 CET 2023 Crashed
Thu Nov  9 12:17:05 CET 2023 Crashed
Thu Nov  9 12:19:19 CET 2023 Crashed
Thu Nov  9 12:24:53 CET 2023 Crashed
Thu Nov  9 12:30:14 CET 2023 Crashed
Thu Nov  9 12:33:00 CET 2023 Crashed
Thu Nov  9 12:42:20 CET 2023 Crashed
Thu Nov  9 12:48:07 CET 2023 Crashed
Thu Nov  9 12:51:53 CET 2023 Crashed
Thu Nov  9 12:56:51 CET 2023 Crashed
Thu Nov  9 12:58:34 CET 2023 Crashed
Thu Nov  9 13:05:40 CET 2023 Crashed
Thu Nov  9 13:20:07 CET 2023 Crashed
Thu Nov  9 13:21:39 CET 2023 Crashed
Thu Nov  9 13:28:31 CET 2023 Crashed
Thu Nov  9 13:37:08 CET 2023 Crashed
Thu Nov  9 13:40:48 CET 2023 Crashed
Thu Nov  9 13:45:53 CET 2023 Crashed
Thu Nov  9 13:48:17 CET 2023 Crashed
Thu Nov  9 14:08:54 CET 2023 Crashed
Thu Nov  9 14:17:38 CET 2023 Crashed
Thu Nov  9 14:26:07 CET 2023 Crashed
Thu Nov  9 14:37:23 CET 2023 Crashed
Thu Nov  9 14:50:44 CET 2023 Crashed
Thu Nov  9 14:54:55 CET 2023 Crashed
Thu Nov  9 14:57:09 CET 2023 Crashed
Thu Nov  9 14:57:53 CET 2023 Crashed
Thu Nov  9 15:07:08 CET 2023 Crashed
Thu Nov  9 15:08:19 CET 2023 Crashed
Thu Nov  9 15:41:12 CET 2023 Crashed
Thu Nov  9 15:46:54 CET 2023 Crashed
Thu Nov  9 16:00:41 CET 2023 Crashed
Thu Nov  9 16:03:58 CET 2023 Crashed

adnovea avatar Nov 09 '23 09:11 adnovea

For a week now, I have a working workaround to manage the read failures. It's not perfect but it works !

I wrote a script /etc/gh/daemon.sh that kills and restarts GHM when the reading fails :

#!/bin/sh
echo -e "\n`date` Start GoHeishaMon"
while [ true ]; do
  [ ! -z `pidof GoHeishaMon_MIPSUPX` ] && kill -9 `pidof GoHeishaMon_MIPSUPX`
  sleep 10
  /usr/bin/GoHeishaMon_MIPSUPX | grep -q "out of time for read"
# echo -e "`date` Crashed\n"
done

and run the script at startup using the Local Startup file from LuCI :

# Put your custom commands here that should be executed once
# the system init finished. By default this file does nothing.

echo "START WATCHDOG   =====" > /dev/ttyS0
echo 300 > /proc/sys/kernel/panic
echo 0 > /proc/sys/kernel/panic_on_oops

(/usr/bin/check_buttons.sh > /dev/null 2>&1) &
/etc/gh/nextboot.sh
echo "" > /etc/gh/nextboot.sh

echo "START GoHeishaMon APL=====" > /dev/ttyS0
#/usr/bin/GoHeishaMon_MIPSUPX > /dev/ttyS0

/etc/gh/daemon.sh > /dev/ttyS0

#/usr/bin/a2wmain > /dev/ttyS0
#exit 0

Across the day, I saw the Free space decreasing. I was afraid of memory losses and crash but it seems there are some housekeeping daemon that clears the logs and recovers the free space every day.

image

adnovea avatar Nov 15 '23 15:11 adnovea

The "real" workaround is checking why reading the serial took more than 5 seconds or if you don't wanna do that increasing the value yourself and building a new image.

vzamanillo avatar Feb 28 '24 18:02 vzamanillo

Dear vzamanillo, Thanks for your answer. Unfortunately as explained above, the reading is fine during a variable period of time a couple of second to 5 minutes or more then the "Checksum received false!" arrives and the software must be killed and restated. I was not able to find out the reason of the crash.

Building a new image requires to install a development chain for OpenWRT but this is out of my scope of skills. For the time being, my "forced workaround" works with H.A.. Maybe someone with more software experience will be able to tackle this issue in the future.

adnovea avatar Feb 29 '24 07:02 adnovea