ModularSensors
ModularSensors copied to clipboard
Digi WiFi S6b becomes unstable
I'm using the Digi Xbee XB2B-WFWT/XB2B-WFUT or WiFi S6B hybrid based off a 0.27.0 baseline .
After some time running it starts to become unstable - after sleeping and being woke up it doesn't connect with the local WiFi network. Occasionally it seems to stop responding to +++ commands. Sometimes this takes two days to show with these problems, and other times its within an hour.
It was working on the 0.25.0 base line. I have introduced changes to reduce and then remove the polling of the MetaData in case this is causing the issue, but it still shows.
I'm documenting this in case anybody else is seeing anything similar and this could be a discussion point.
It is a relatively "soft" issue, as it doesn't show straight away and there are a lot of complex elements in the network. It has been happening against two different WiFi gateways I have. It is POSTing data to the MMW.
After Mayfly reset, it reliably connects to the local network WiFi, gets the NIST time successfully, then its only later it isn't able to connect again.
It has been happening across three different Mayfly test systems (each with a WiFi S6B) I was putting on a 0.27.0 stability test.
I have introduced changes to make sure the WiFi S6B hybrid is HARD reset when it shows this, but hasn't appeared to make any difference.
I have moved to the 0.27.5 baseline to be compatible with the latest release, but there isn't much added functionality between 0.27.0 and 0.27.5
I was thinking my next step is to use the "LTE Bee Adapter Rev 1b" board, modified to power down when reset is active.
I've seen this happening, but I haven't had time to troubleshoot it yet.
Ok thanks - good to know (negative logic) - that I'm on to something :( I'm focusing on my SDI-12/LT500 issues 1st.
The interface code, and on every time the S6B wakes forces the S6B to do a write to its persistent store. Some persistent stores (eg EEPROM's technologies) often have limited writes - that is read many times, write a few times. I asked a question about this https://www.digi.com/support/forum/77803/wifi-s6b-wr-use-sparingly?show=78105#a78105
The WiFi S6B manual in the "AT Commands" about WR (Write)
"Use the WR command sparingly to preserve flash"
Ref Pg197
XBee Wi-Fi RF Module S6B User Guide 90002180 RevU 2019Aug
I'm using a library https://github.com/vshymanskyy/TinyGSM
and when it comes out of sleep it sets a few parameters ATAP0 ATGT64 ATCT64 followed by an ATWR .
What does "Use the WR command sparingly to preserve flash" mean.
If the values of a register are already set, is the ATWR smart enough to see that and not make changes in the flash?.
many thanks
and the answer came today by "mvut Veteran of the Digi Community "
It means that you should only issue the WR if the values you are setting have not already been written to flash. IE, you should read the values and ONLY if they are different should you set the value and write it.
So I'm just identifying this as part of the issue, as to why the Digi WiFi S6b hybrid has become unstable, and for my purposes essentially unuseable as radio.
I have used the Digi WiFi in various versions since 2010, and not seen this before. I had been hoping to use this configuration for a local monitoring station of a wifi portal, but delayed it until I can make it work reliably.
I had never noticed that warning before in the XBee manual. That needs to be fixed in TinyGSM.
I've modified TinyGSM to read most parameters from the XBee before attempting to write so it does not have to write to flash when no actual change has been made. It's not perfect, though. For some fields (like passwords) the current value cannot be read back from the XBee so we have no choice but to write it each time.
Hey thanks. Will check it out.
Hi @SRGDamia1 could you point at what you did in TinyGSM - I can't find any updates, and I'm still having issues. Thanks
I created the function changeSettingIfNeeded
(https://github.com/vshymanskyy/TinyGSM/blob/a5a2ce34538955bb43d89df09fad4e30242be9c1/src/TinyGsmClientXBee.h#L1510) and use that everywhere to check if there are actually changes to be made before writing anything to flash. Unfortunately, it won't work for everything. Some settings, like the wifi password, cannot be read back from the XBee. So there's no way of knowing if you're making a change or not so you have to play safe and write the "new" value to flash. ModularSensors does check if the internet is connected before trying to modify the connection settings (and password) but if the XBee cannot connect fast enough, the password will end up being re-written to flash. There should not be any more flash for the IP address if it doesn't change or for any other non-password-like settings.
Unfortunately.. if your S6B's flash has already become unstable because of excessive writing, no changes here will help with that. I doubt anything at all could be done to remedy it.
Thanks for the info. OK thats what I thought, it is using changeSettingIfNeeded() Hmm flash is interesting for multiple writes, it is usually a write wear, but could be something else after running for a period.
I did test with my older Xbee S6B, and still saw the same instability, and didn't have the time to dig into it. I will move to a new Xbee S6B, and also track the updating which I didn't have time to do. I'm hoping to get some time to work on this soon :). Again many thanks for doing changeSettingIfNeeded() It is the right thing to do to not be updating every communication cycle, which is what I think was happening. Writing security once per reset sequence I would think would be fine.
I ran the tests with a brand new Xbee S6B. No different. Reverted back to an older S6B, and with some accelerated testing was running it at 2minute sleep interval. Having an "unreliable link" (ideally able to manage the reliability) is good for my protocol "reliable delivery" testing to MMW. At the 2minute sleep interval, it suddenly started deliverying to MMW. As part of some stability testing this weekend, had it running at a 10minute interval. Worked for the first few MMW POSTs, and then failed for the next 30hrs. Then 3am this morning - when my network router rebooted - its been very reliable since then. So it does seem like its something to do with the TCP/IP link expecting to be there when it comes out of sleep. I have been trying a TinyGSM lib adaptation to reset the destination (tearing down) when going to sleep, and then re-estabilishing when waking, but haven't managed to navigate through TinyGSM that well yet. Have had to shelve it to deal with a more urgent field reliability issue.
Some recent testing with an update time set at every two minutes resulted in some successful POSTS. This suggests it is the TCP/IP link from the XBEE to the destination MMW that is decaying. That is the when the Xbee WiFi goes to sleep, it doesn't tear down the connection. Then when it wakes it isn't doing enough to re-establish the connection. As suggested in the code, it needs to torn down before it sleeps. Perhaps by setting it to 128.0.0.1. However, there is also some caching of the target IP# in the tinyGsm code, and haven't quite figured out where and when the cache is referenced.
I have a potential user/Environmental Scientist (Biologist) who has a stream location that has a WiFi access, to connect to a depth gage sensor. I've been thinking that maybe the way to create a clean tcp/ip link is to be sure its torn down from the DigiXBeeWifi::disconnectInternet() by changing the IP to localhost 127.0.0.1 Some further investigation, and watching the stream of AT commands, there is an attempt at teardown after every post on TinyGsmClientXbee:modemStop() by changing the timeout to 0 "TM0" and then back to the default. It also forces a write update as well. The issue is partly what is the model for accessing the Server - that discussion is identified in https://github.com/ODM2/ODM2DataSharingPortal/issues/485 The model I'm attempting is to initialize the Xbee S6B WiFi once (after Mayfly Reset), and do a ATWR. Then afterwards coming out of Xbee sleep setup a link to MMW, attempt to validate it, and then do all the POSTS waiting for responses, and then be clear about tearing it down.
The communication model incorporates an application layer "delayed delivery" with an internal "reliable delivery". That is having multiple new readings generated between each attempt at connnecting over WiFI to MMW. There is then an attempt to POST all of them. Sometimes it appears the first POST, fails with a 5 second timeout, but then the second attempted POST within the same TCP/IP link setup succeeds.
I've got a fix for this and had it testing for the last couple of days successfully. There are two parts 1) TinyGSM update 2) DigiXbeeWiFi Would this be of interest for a PR for EnviroDIY/ModularSensors If for TinyGSM then EnviroDIY/TinyGSM or vshymanskyy/TinyGSM More details at https://github.com/neilh10/ModularSensors/issues/21
I'm (finally!) looking at this. Could you explain what's going on with your fix? What are the caller ID offsets?
Hey welcome back. ! and happy thanksgiving, trying to get this comment in before an evening meal.
Compare against https://github.com/neilh10/TinyGSM/blob/rel1/src/TinyGsmClientXBee.h ignore all the waitResponse() in my private branch as this is debugging and wouldn't be included in a PR.
It seems the issue is that S6B isn't tearing down the tcp/ip connection on sleep. So what i do is change the tcp/ip connection to local: before sleeping, which should have been enough I think to solve the problem, but it isn't so then I do a software reset also tried some longer guard times for responses.
Anyway if you want a working S6B (not clear about that as I know there is a lot going on) then tell me which repo it should be against and I'll put a tested version against it.
I realize there is a lot of outstanding PRs, and finite time in the day, so thought to myself I'll wait until there is band width to deal with it before doing all the work on my par.
Just to reference the above comment - on merging its broken the curated version of TinyGsm for Digi WiFi module that I have. Its taken me two hours to track it down.
The new code updated in LogerModemMacros.h is doing some highly unusual reprogramming to cope it seems with EspressifESP32 challenges, and possibly having knock on effects with all other modems.
IMHO the real issue is that there are a lot of communication modules supported and there is no curated list of modules that work on any release.
The matrix of parts that works for any project is a big challenge. There is no standard way that ModularSensors identifies regression tests, that users of ModularSensors can contribute to the testing process.
The DIgi WiFi module works for me as it has a variety of RF connectors for adding high gain antennas. WiFi/2.4Mhz is attenuated in the outdoors by moisture (leaves) and often benefits from a simple antenna extension for a signal boost to go further.
solution https://github.com/neilh10/ModularSensors/issues/125
The core of the solution is to force the tcp//ip link to close after each POST, before the WiFi device is put to sleep to save power.
The WiFi acc/pwd are also only programmed at startup - this is done by having the device driver know the state of the modem, and only programming it once on power up. Thereafter its cached locally.
There are other debugging and housekeeping also included, and I haven't wanted to change it from the core of what I have tested over 2years.
The solution I have is based on two systems that have been working in the field.
Both of these systems have had other problems - disconnected solar and the virtual failure - however when these problems where fixed the upload over WiFi using reliable delivery/batch queue algorithms on my fork (https://github.com/neilh10/ModularSensors/issues/1) has worked for 100% of outstanding records.
https://monitormywatershed.org/sites/nh_LCC45/ transmitting reliably since at least since Jan14/2022 https://monitormywatershed.org/sites/TUCA_Sa01/ - this transmits to a comcast wifi point - the strongest signal I've seen from a WiFi SSID. It is currently stopped transmitted due to a solar wire pulled out May 28th - probably by a deer, before that it had https://github.com/ODM2/ODM2DataSharingPortal/issues/658 - which was out for some 5 weeks. When it was fixed, it uploaded the fastest of all the systems.
I'm generating two PRs - a) reference enviroDIY (develop) - 0.34.0 - and testing it locally, and submitting the files including two test setups. The Mayfly version should come up as 0.34.1-iss347a - https://github.com/EnviroDIY/ModularSensors/pull/441 b) reference TinyGSM 0.11.5 - https://github.com/vshymanskyy/TinyGSM/pull/731
The test setup only uses local sensors. As this is a verification test setup I also describe and track equipment., The software runs on real equipment, Mayfly 1.1 REv A - S/n unreadable - however initially programmed in EEPROM as sn[ MAYFLY22150 ] XbeeWiFi internet comms with Digi XBee Wi-Fi Mac/Sn 409D8F65B4 HwVer 2730 FwVer 2026 Has a 2.2A LiP Adafruit battery The WiF Nework access is Synology RT2600ac
For building I'm using the latest Pio on VSC. For working files in src - there is an alpha development environment that I configure in folder ModularSensors\a\DRWI_SIM7080LTE
Both the following tests manage the visibility of the AT cmd stream through the platformio.ini, thanks to the amazing StreamDebugger.h https://github.com/vshymanskyy/StreamDebugger
The platformio.ini is setup for development that is against against local source ModularSensors\src. Change to desired destination for TinyGSM
For files referencing to the lib ModularSensors use folder ModularSensors\sensors_test\DRWI_SIM7080LTE. Change to desired destination
For testing - I've let it run a couple of hours at two minute sampling and verified it gets a '201' - this isn't a long term test
Then I've turned off the WiFi signal, let it try a couple of times to find it, then turned it back on, and its continued transmitting. Loosing of course any attempted readings as Reliable Delivery is not implemented yet.
Hope I haven't missed anything. Happ to answer any questions - I'm out tomorrow 22nd at https://www.sensorsconverge.com/sensorsconvergecom/expo-highlights
Debug listing files files tty230619-1633_mainWiFi-beforeUpdates.txt
Sara added to (develop) as part of https://github.com/EnviroDIY/ModularSensors/pull/445 I have extensively tested this in my fork, however been modified on accepting into (main) and I can't guarantee the traceability of my testing.
A weekend of testing and no level2 modem driver issues. Plan on posting testing data here https://github.com/ODM2/ODM2DataSharingPortal/issues/661