ModularSensors
ModularSensors copied to clipboard
Field debug support
Discussion: When a logger is placed in the field, solar powered, has been running for some days possibly weeks, and an operational issue comes up (stops performing as expected, possibly a bug) what debugging techniques can be used to zero on the operational issue , or what new debug features might support being able to diagnose any field issues that come up.
New Feature suggestion : need to do runtime debugging, (and its something I could start if of interest.)
I define a field issue as where something isn't working right, that there is limited physical access (and power) to a specific riparian (or other remote) site. The limited physical access may be because as part of a team there can only be access every couple of weeks. The field issue may be that some one unfamiliar with Modular Sensors but with technical computer skills may be able to connect to the terminal and follow some simple instructions possibly over a phone link.
Current Context: For software functional debugging typically a compile level flag MS_xxx can be invoked, that when the program is downloaded prints out the program context. This is normally done as part of program development in a safe context with lots of available power. This works for testing in the office, and in controller local test conditions. However this is about monitoring the real world, which is usually not local, and unexpected events happen that need to be discovered and adapted to.
Example: A Mayfly ModularSensors logger with Verizon LTE is deployed to a field location, as part of work party studying physical riparian parameters. Its an upgrade of a study that has been ongoing for 5years. After two months the Mayfly logger stops connecting to the internet. As the owner of the code (MS with my modifications) I go to the site, however I have to give one weeks notice to the landowner to access the site, and drive two hours to get there, and hike in 15minutes into the site,
I plug into the logger using a special FTDI cable that ensures the Mayfly doesn't reset, and with TeraTerm and logging with timestamp enabled.
After waiting for some time
I see it attempt to POST to MMW, that it has effectively lost its "modem"
[2021-07-13 11:47:01.288] Connecting to the Internet with <<<< no modem [2021-07-13 11:47:08.320] [2021-07-13 11:47:08.320] pubDQTR Sending data to [ 0 ] [2021-07-13 11:47:08.649] POST /api/data-stream/ HTTP/1.1 [2021-07-13 11:47:08.649] Host: data.envirodiy.org .. [2021-07-13 11:47:08.649] {"sampling_feature":.....} .. [2021-07-13 11:47:16.640] WATCHDOG ISR barksUntilReset 149 <--WatchDogAVR [2021-07-13 11:47:16.702] -- Response Code -- 504 waited 5011 mS Timeout 5000
After a reboot the modem reappears, but how to figure this out with out rebooting.
[2021-07-13 11:50:11.182] Connecting to the Internet with Digi XBee3 Cellular LTE-M [2021-07-13 11:50:19.506] WATCHDOG ISR barksUntilReset 149 <--WatchDogAVR [2021-07-13 11:50:21.881] ... Watchdog low barksUntilReset 149 expected 150 [2021-07-13 11:50:21.881] [2021-07-13 11:50:21.881] pubDQTR Sending data to [ 0 ] data.envirodiy.org [2021-07-13 11:50:23.084] POST /api/data-stream/ HTTP/1.1 [2021-07-13 11:50:23.084] Host: data.envirodiy.org ... [2021-07-13 11:50:23.084] {"sampling_feature":"..} [2021-07-13 11:50:26.398] -- Response Code -- 201 waited 1688 mS Timeout 5000
This site works for a couple of days and then stops. Some background - I've been had advanced accelerated testing in a safe location on a system with same components, and yet I never seen this issue. Since it stops delivering data, its a "Critical" issue for me. The whole purpose of this logger was to make the sensor readings remotely visible.
Today I'm digging through a backlogged mountain of issues. Have you come up with a better field code debugging solution? It's something I've really struggled with because so much of the time if you allow the board to restart, the issue is magically, at least temporarily, fixed, but that also erases any evidence of what the issue that caused it to stop in the first place.
Hey thanks for commenting on it. It is a difficult issue, as debugging consumes code space, some ram, and some realtime.
The purpose is for a knowledgeable tester, doing stability monitoring, be able to detect input conditions and outputs for specific modules.
My thoughts are for each module to have a dedicated byte of ram per module, and then have the debug conditions keyed on bits set in that debug_module. That would give potentially 8 types of debug conditions that could be switched on and off.
It implies the tester needs a terminal UART interface, and there is a hardware issue that I think has caused some nasty problems for me- https://github.com/EnviroDIY/EnviroDIY_Mayfly_Logger/issues/28
I'm solving this with an external plug on the FTDI connector, with a 1Mohm R that pulls the UART RXd low.
I use an FTDI connector that is neuted so it doesn't cause a reset when plugging in/out - https://github.com/neilh10/ModularSensors/wiki/Test-Equipment-FTDI-cable
I have got a Serial Cmd/UART processing module that is pretty easy to design a simple custom interface, that puts all the code into the user examples/.cpp https://github.com/neilh10/ModularSensors/blob/release1/examples/tu_xx01/src/tu_serialCmd.h
It does means debug is compiled into the whole load, which is partly what is needed. Generally "user interface", in this case debug can use 20% or more of code space. I've not done any work to see how much space is used when turning on debug for modules.
that's an overview of an option that I've used in the past that has been pretty successful for a lot of module level logic bugs.
This is an unlikely retrofit as it would need to be designed in from the beginning. From my forks field and test systems where I've made reliability improvements, the system operates sufficiently well. Though I occasionally have a system hang and haven't managed to figure it out