ESP8266 reboot randomly after awhile
Basic Infos
- [X] This issue complies with the issue POLICY doc.
- [X] I have read the documentation at readthedocs and the issue is not addressed there.
- [ ] I have tested that the issue is present in current master branch (aka latest git).
- [X] I have searched the issue tracker for a similar issue.
- [ ] If there is a stack dump, I have decoded it.
- [ ] I have filled out all fields below.
Platform
- Hardware: [ESP8266 - ESP-WROOM-02]
- Core Version: [2.7.4]
- Development Env: [Arduino IDE]
- Operating System: [Ubuntu]
Settings in IDE
- Module: [|Nodemcu]
- Flash Mode: [qio|dio|other]
- Flash Size: [4MB/1MB]
- lwip Variant: [v1.4]
- Reset Method: [ck|nodemcu]
- Flash Frequency: [40Mhz]
- CPU Frequency: [80Mhz|160MHz]
- Upload Using: [SERIAL]
- Upload Speed: [115200|other] (serial upload only)
Problem Description
I am working on ESP8266 board (ESP-wroom-02), and use wifi-ap, it works fine most of time, the device can connect to it and sends/receives data without problem. However it will reboot randomly if letting it run for awhile, sometimes it reboots within 30 minutes, sometimes it reboots after several hours. Multiple boards have been tested, all of them have same problem.
The following 2 line message are shown from serial port right before rebooting, then it will reboot immediately: "tx rts error 0x16" "mac 1370"
I am using arduino IDE with ESP8266 sdk v2.7.4, nonosdk, I have spent so much time to identify the issue, however I have no clue so far.
I suspect it has something to do with Nonos_sdk, and but there is no source code to let me tracking down where and how the issue occurs.
NOTES: I must use ESP8266 SDK v2.74 (or lower), but cannot use greater than v2.7.4, since I need to use lwip v1.4 compile from source option, and high version sdk doesn't have this option but only lwip v2.0, which is not suitable for our project.
Please help to resolve this issue. Thanks in advance.
I suspect it has something to do with Nonos_sdk, and but there is no source code to let me tracking down where and how the issue occurs.
It is bundled as a precompiled lib, see tools/sdk/lib/NONOS_... Note that you could switch between versions, see IDE menu or PIO build flags documentation
NOTES: I must use ESP8266 SDK v2.74 (or lower), but cannot use greater than v2.7.4, since I need to use lwip v1.4 compile from source option, and high version sdk doesn't have this option but only lwip v2.0, which is not suitable for our project.
https://github.com/d-a-v/esp82xx-nonos-linklayer ? you may run into some lwip1.4 bug in the networking stack. While we kind of match in version shipped with the original SDK, it may not be the exact same thing
It is bundled as a precompiled lib, see tools/sdk/lib/NONOS_... Note that you could switch between versions, see IDE menu or PIO build flags documentation
Thanks, I DID tried different version of NONOSSDK, there is no difference, all have same issue.
https://github.com/d-a-v/esp82xx-nonos-linklayer ? you may run into some lwip1.4 bug in the networking stack. While we kind of match in version shipped with the original SDK, it may not be the exact same thing
since I have lwip1.4 source code, so i put some debug code in it to try to catch the rebooting problem, however there is no issue in lwip at all when the rebooting happens, that is why I suspected it is coming from NONOS_SDK.
since I have lwip1.4 source code
Just to repeat, it is lwip2 source builder. If you use the makefile, lwip2 source will be in repo/lwip2-src and it will replace the original files at tools/lib (but, sorry if I misunderstood the reasoning :)
Also note that we usually dump some stack and exception info as postmortem message. But, you would need to enable serial to debug this way or use our custom_crash_callback. Exception address will usually point to the offending function. Also see https://github.com/esp8266/Arduino/tree/master/libraries/esp8266/examples/HwdtStackDump which is only available in our 3.x.x versions
Hi, Max There actually is NO exception happens when the reboot occurs, but only 2 lines messages output from the serial. "tx rts error 0x16" "mac 1370"
I tried the HwdtStackDump you shared, still I got no exception but same above message before reboot. Is it possible to check the source code of NONOS_SDK to see where the "tx rts error..." is printed out? Thanks a lot,
Just going over something simpler than trying to debug the SDK. We don't have access to its source so the only way is to go about decompiling its blobs. So we not really have much things to go on and see some pattern.
SDK message is just something wrong with the WiFI TX (I'd guess), which could mean literally anything. Try changing the connecting client hardware? Or the used channel? Watching raw WiFi traffic might also help, since we do see some RTS/CTS WiFi issue (maybe related to whats happening, maybe not)
Also, note that hwdt postmortem is explicitly enabled with a build flag / IDE menu option. And you'd see Hardware WDT Stack Dump - enabled around the time app loads
Erasing flash at the very end sometimes helps. It is also an IDE option for flash erase, named 'Sketch + WiFi settings' or 'All flash contents'. Using different versions of SDK may have weird effect on settings sector or it has corrupted data
Just a few things that come to mind when seeing these...
- Are you running out of memory?
- Are you reading the internal ADC?
- Is it possible your code may run for quite some time without calling
delay()inbetween? - Does this also happen if you continouously ping the node from a random host in your network?