enviro icon indicating copy to clipboard operation
enviro copied to clipboard

Enviro - Fails to go to sleeep / wakup after error? - Due to time between readings?

Open dave-ct opened this issue 2 years ago • 8 comments
trafficstars

Runing on latest version 0.0.9 Getting occasional hang on the urban devcie after I have an issue with MQTT uploads, but the issue seems to be it is not going back to sleep correctly or waking up. It can be recovered with the reset button.

@ZodiusInfuser Possible Issue: As the reading frequency here is set to 2 minutes, you can see it woke up at 10:02:04. Then when it get the upload faiulre it has been running for close to 2 minutes already and tries set the sleep from 2 minutes after the wake up which is 10:04. But when setting the RTC in the log file it shows the time is already 10:03:59 as such it is possible that by the time it shuts down 10:04 has already passed and may casue the isssue? The issue shows on the board with a redlight flashing and white light on. Note I preseed the reset button at 12:52 in the log below. Note that i did not try the poke button this time but will next time.

2022-12-16 10:02:04 [debug    / 115kB] > performing startup
2022-12-16 10:02:04 [info     / 122kB]   - wake reason: rtc_alarm
2022-12-16 10:02:04 [debug    / 120kB]   - turn on activity led
2022-12-16 10:02:04 [debug    / 118kB] > 99 blocks free out of 212
2022-12-16 10:02:04 [debug    / 116kB] > taking new reading
2022-12-16 10:02:04 [info     / 111kB]   - seconds since last reading: 120
2022-12-16 10:02:04 [debug    / 108kB]   - starting sensor
2022-12-16 10:02:04 [debug    / 106kB]   - wait 5 seconds for airflow
2022-12-16 10:02:10 [debug    /  87kB]   - taking pms5003i reading
2022-12-16 10:02:10 [debug    /  85kB]   - taking microphone reading
2022-12-16 10:02:10 [debug    / 116kB] > caching reading for upload
2022-12-16 10:02:10 [info     / 111kB] > 3 cache file(s) need uploading
2022-12-16 10:02:10 [info     / 108kB] > connecting to wifi network '<removed>'
2022-12-16 10:02:17 [info     /  86kB]   - ip address:  192.168.0.236
2022-12-16 10:02:17 [info     / 116kB] > uploading cached readings to MQTT broker: <removed>
2022-12-16 10:02:22 [info     / 108kB]   - uploaded 2022-12-16T09_58_10Z.json
2022-12-16 10:03:58 [debug    /  49kB]   - an exception occurred when uploading. Traceback (most recent call last):
  File "enviro/destinations/mqtt.py", line 34, in upload_reading
  File "enviro/mqttsimple.py", line 110, in connect
IndexError: bytes index out of range

2022-12-16 10:03:58 [error    /  74kB]   ! failed to upload '2022-12-16T10_00_10Z.json' to mqtt
2022-12-16 10:03:58 [error    /  72kB] ! reading upload failed
2022-12-16 10:03:58 [info     /  70kB] > going to sleep
2022-12-16 10:03:59 [debug    /  68kB]   - clearing and disabling previous alarm
2022-12-16 10:03:59 [info     / 118kB]   - setting alarm to wake at 10:04am
2022-12-16 12:52:30 [debug    / 115kB] > performing startup

From the code it looks like we log the entry setting alarm to wake at 10:04am and then we set the alarm so it could be in the past or the same as the current time as we dont get the log entry - shutting down as such there could be an error with the the sleep function if the minute and hour are the same as the current time when we set it?

dave-ct avatar Dec 16 '22 13:12 dave-ct

Oh, that's quite the edge case! It seems like we need to include a 10-15 second buffer on the time, so that in this case it will miss the 10:04 reading but take the 10:06 reading instead.

What you can do in the immediate term is change the sleep function in __init__.py so that:

  # set alarm to wake us up for next reading
  dt = rtc.datetime()
  hour, minute = dt[3:5]

  # calculate how many minutes into the day we are
  if time_override is not None:
    minute += time_override
  else:
    minute = math.floor(minute / config.reading_frequency) * config.reading_frequency
    minute += config.reading_frequency

becomes

  # set alarm to wake us up for next reading
  dt = rtc.datetime()
  hour, minute, second = dt[3:6]

  # calculate how many minutes into the day we are
  if time_override is not None:
    minute += time_override
  else:
    if second > 55:
      minute += 1
    minute = math.floor(minute / config.reading_frequency) * config.reading_frequency
    minute += config.reading_frequency

FYI, i've been doing several quality of life fixes over in, https://github.com/pimoroni/enviro/tree/patch/adafruit_io_fixes, which includes some RTC stuff

ZodiusInfuser avatar Dec 19 '22 16:12 ZodiusInfuser

@ZodiusInfuser Thanks for this, have included it for the last 5 days and no issues.

dave-ct avatar Dec 29 '22 12:12 dave-ct

Big thumbs up for the adafruit_io_fixes branch - running this for the last few days on Enviro+ Urban without incident. Previously it would often crap out in <24 hours, but now it just keeps on going (🤞 just in case!)

martin-hamilton avatar Jan 06 '23 23:01 martin-hamilton

@martin-hamilton Are things still running for you with the fixes from that branch?

ZodiusInfuser avatar Jan 20 '23 18:01 ZodiusInfuser

Looks like I spoke too soon - Enviro+ still crapping out. I think it may be more down to humidity ingress than code or hardware problems. Am going to try protecting it with bubble wrap and silica gel beads in case this helps - with a few strategically placed holes for the sensors, of course! I am using the Pimoroni Stephenson Screen, but the Enviro+ has been getting noticeably damp 😳

martin-hamilton avatar Jan 20 '23 19:01 martin-hamilton

Just a quick follow-up - thought I would leave the Enviro+ indoors for a bit to see how it fared, and is still crapping out after a few hours. I tried putting a few debugging statements info see if I could spot a pattern, and noticed that the NTP time sync and the RTC write don't always work. I'll experiment with adding a bit of "retry with a delay between attempts" logic to see if that helps any

martin-hamilton avatar Jan 30 '23 11:01 martin-hamilton

Today I hit a similar issue on my Indoor. From the log:

2023-01-31 14:30:37 [debug    /  84kB]   - an exception occurred when uploading. Traceback (most recent call last):
  File "enviro/destinations/adafruit_io.py", line 31, in upload_reading
  File "urequests.py", line 184, in post
  File "urequests.py", line 91, in request
OSError: [Errno 103] ECONNABORTED

2023-01-31 14:30:37 [error    /  81kB]   ! failed to upload '2023-01-31T14_30_11Z.json' to adafruit_io
2023-01-31 14:30:37 [error    /  79kB] ! reading upload failed
2023-01-31 14:30:37 [info     /  77kB] > going to sleep
2023-01-31 14:30:37 [debug    /  75kB]   - clearing and disabling previous alarm
2023-01-31 14:30:37 [info     /  73kB]   - setting alarm to wake at 14:35pm
2023-01-31 23:03:52 [info     / 113kB] > performing startup
2023-01-31 23:03:52 [debug    / 111kB]   - running Enviro 0.0.9, MicroPython 9dfabcd-dirty on 2022-11-18
2023-01-31 23:03:52 [info     / 121kB]   - wake reason: rtc_alarm
2023-01-31 23:03:52 [debug    / 119kB]   - turn on activity led

But normally shutdown should look like this:

2023-01-31 14:25:17 [info     /  96kB] > going to sleep
2023-01-31 14:25:17 [debug    /  94kB]   - clearing and disabling previous alarm
2023-01-31 14:25:17 [info     /  92kB]   - setting alarm to wake at 14:30pm
2023-01-31 14:25:17 [info     /  90kB]   - shutting down
2023-01-31 14:25:17 [debug    /  88kB]   - on usb power (so can't shutdown). Halt and wait for alarm or user reset instead
2023-01-31 14:30:00 [debug    /  86kB]   - reset

This is the same in the failed log above - it looks like writing to the RTC hung, so the board never shut down.

Maybe the Enviro firmware should set an alarm that reboots the board if the update takes more than 2-3 minutes? Unfortunately I don't think you can use the normal watchdog timer because that is only ~8s on RP2040 and connecting to WiFi and uploads might legitimately take longer than that.

MichaelBell avatar Jan 31 '23 23:01 MichaelBell

I've now found issue #119, which looks the same as my error at least. Interesting proposed fix there, using PIO from Micropython to avoid having to change the C++.

MichaelBell avatar Feb 01 '23 00:02 MichaelBell