ESPixelStick icon indicating copy to clipboard operation
ESPixelStick copied to clipboard

Version 4.0-beta3 randomly reboots

Open darranwil opened this issue 2 years ago • 22 comments

--------- Instructions -------- Please provide answers directly below each section. --------- Instructions ---------

4.0-beta3

wemos d1 mini pro with external antenna

precompiled

Windows 10 V21H1

Firefox 92.01

Access Point

Random Reboots. Wired to max485. Tested for solid 48 hours on previous version 3.2 with mulkticast sacn stream, no reboots. Version 4.0-beta3 randomly reboots with no data being sent to it. Sometimes 20 mins, sometimes 5 hours...

darranwil avatar Oct 05 '21 22:10 darranwil

There's an issue we're trying to track down related the webserver library that we're using. Did this occur during web interaction or while just running / processing data?

forkineye avatar Oct 06 '21 12:10 forkineye

Rebooted several times while having the page open to monitor uptime over several hours and seemed to randomly reboot. Sometimes last 20 mins, sometimes 45, sometimes several hours. Ran without browser open for almost 8 hours and only rebooted twice. Never sent any SACN or MQTT while doing uptime test.

darranwil avatar Oct 06 '21 19:10 darranwil

Try with no browser connected to the device and then after a day check the uptime. The problem is the WebServer used to present the status page has issues. Check the device every once and a while. I had mine up for 6 days and had it playing data in response to an FPP show player. But my web page was closed the entire time.

MartinMueller2003 avatar Oct 07 '21 15:10 MartinMueller2003

Try with no browser connected to the device and then after a day check the uptime. The problem is the WebServer used to present the status page has issues. Check the device every once and a while. I had mine up for 6 days and had it playing data in response to an FPP show player. But my web page was closed the entire time.

As stated in my previous post, I did that. I've flashed several chips (wemos D1 Minis), browser closed, and same result. Sometimes 20 mins, sometimes 2-3, sometimes 4-6 hours before reboot.

darranwil avatar Oct 22 '21 04:10 darranwil

Just FYI since you seem aware of this. Just got some ESPixelSticks V3 to play with. They run fine on FW 3.2. Trying out V4 Beta 3 and am having the same WebServer problems. Very slow or no response sometimes, WebSocket randomly looses connection and random reboots. Attached is some serial output I captured. Looks like watchdog timeouts, but I have also seen it throw an exception and spit out a HEX dump. But I haven't capture that yet.

SerialLogOutput.txt .

aggie81 avatar Oct 25 '21 01:10 aggie81

Spoke too soon. Just caught an exception. This occurred as I changed the input from E131 to DDP and clicked on save config. This log is same as above plus the exception.

SerialLogOutput2.txt

aggie81 avatar Oct 25 '21 01:10 aggie81

Downloaded and flashed after "Switch ESPAsyncWebServer to yubox fork". Still random reboots with browser closed.

darranwil avatar Nov 09 '21 22:11 darranwil

From the stack traces, there seems to be something occurring inside LWIP. Can you give this version a try? It's compiled with LWIP2 configured to "High Bandwidth, No Features" - https://github.com/forkineye/ESPixelStick/actions/runs/1441917656. https://arduino-esp8266.readthedocs.io/en/latest/ideoptions.html#lwip-variant

forkineye avatar Nov 09 '21 23:11 forkineye

FYI: Using High Bandwidth with no features has been very stable for me. Can we close this?

MartinMueller2003 avatar Dec 22 '21 15:12 MartinMueller2003

My appologies to everyone for dropping out for so long. Covid long hauler, twice now and this time has put me down for weeks at a time. I just installed the lastest Beta4 and will let it run 24/7 in Artnet/DMX output mode connected to a fixture.

darranwil avatar Dec 24 '21 06:12 darranwil

Welcome back and I hope you dont get it again. Looking forward to closing this issue.

MartinMueller2003 avatar Dec 25 '21 16:12 MartinMueller2003

I have had the latest version running on an ESP32 board for the past 5 days WITH the browser open to the status page. It is running in an FPP remote player mode. It has been up and running for 6 days and has played 710 songs. NO REBOOTS. No funny issues just keeps on going. I do have an intentional issue (one of the fseq files for a song is not on the SD card) and this has not caused any issues.

MartinMueller2003 avatar Dec 26 '21 20:12 MartinMueller2003

@MartinMueller2003 I'm leaving this open for now, as I believe there are still some issues related to LWIP and the ESPAsyncWebserver / ESPAsyncTCP implementation underneath. LWIP 2.1.3 is currently being integrated into the ESP8266 core and I'd like to do more testing with it first. thanks, -shelby

forkineye avatar Dec 28 '21 15:12 forkineye

Still randomly rebooting for me with browser closed and no Artnet data being sent to it. Seems to be about every 8-12 hours. The only thing I've noticed is the heap size seems to go up and down but never runs out.

darranwil avatar Dec 28 '21 15:12 darranwil

Just got back to trying ESPixelStick v4.0-beta4 on ESPixelStick Ver 3 HW, Seems to be running better than beta3, but beta4 still also randomly reboots, usually when accessing the the Web UI and changing parameters on the Device Setup page. Once setup (DDP mode), I have been able to send sequence data directly from xLights to a 150 pixel string for at least 4 hours now with no reboots, with the web UI open at the Home page to monitor uptime. As long as I am not changing parameters in the Web UI it seems to be fairly stable so far just receiving DDP data.

aggie81 avatar Jan 23 '22 19:01 aggie81

ESP8266 has far less ram available than the ESP32 implementation. This causes the Web UI to occasionally starve the rest of the system and that causes crashes.

MartinMueller2003 avatar Jan 24 '22 01:01 MartinMueller2003

Can there be a work around for V4 and ESP8266 HW? Ver 3.2 and the latest WLED 0.12.0 both seem to run solid on the ESP8266 in my testing.

aggie81 avatar Jan 24 '22 01:01 aggie81

We've been trying to get to the core of the issue. It started happening during the re-factor from 3.2 to 4.0 and is what has been keeping 4.0 from becoming a "stable" release and is related the webserver library and underlying asynctcp code that is being used.

forkineye avatar Jan 24 '22 12:01 forkineye

Shelby / Martin - Have you tired to remove the Arduino JSON Library and all references. When I was developing this project (https://github.com/onewithhammer/ESP8266-MyWidget-Demo) I was having the similar results with exceptions and was pulling my hair out until I remove this library and all references.

See my notes from the project: I originally tried to send / receive JSON messages using the popular Arduino JSON Library ArduinoJson but I couldn't make it stable. I kept getting exceptions happening in various places, while stress testing (calling GET heap repeatively), so I eventually removed the ArduinoJson library and references. I converted all Web Services messages to send/receive text messages. I also converted files to save as text files (cfg.txt) instead of JSON.

This may not be the issue but I can tell you I struggled to get my project stable until I removed this library.

onewithhammer avatar Feb 19 '22 17:02 onewithhammer

Reading and displaying config information does not use any ArduinoJson functions. It reads the files directly from the SD card into a buffer and sends that information. ArduinoJson is used to build and send status for the home page and to process configuration updates. In other words very minimal interactions done is json. I most often see the crash on moving from one page to another and the issue is worse as ram is used up. Analysis of the crashes shows most of them are in the TCP processing stage where the system is trying to allocate buffers. I have taken great care to make sure ArduinoJson has released all resources prior to interacting with the web server.

MartinMueller2003 avatar Feb 19 '22 18:02 MartinMueller2003

Any idea of when a stable release will be available that addresses this issue?

onewithhammer avatar Feb 19 '22 18:02 onewithhammer

The ESP32 has more ram and I do not see these crashes in my system. The ESP8266 Ram is on the edge and sometimes gets in trouble.

MartinMueller2003 avatar Feb 19 '22 22:02 MartinMueller2003

Is anyone seeing this on my latest builds on the ESP32 platform? I know this is a ram issue with the ESP8266 but I have not seen it in a long time on my ESP32 versions.

MartinMueller2003 avatar Oct 05 '22 11:10 MartinMueller2003

@MartinMueller2003 I'd be happy to test on the ESP32. Is there a drop in replacement ESP32 for v3 PixelStick? e.g. ESP32 D1 Mini

akennerly avatar Oct 12 '22 03:10 akennerly

Yes the ESP32 D1 Mini is a drop in replacement. Just grab the repo and build for the mini or grab the images I keep on google drive.

MartinMueller2003 avatar Oct 12 '22 10:10 MartinMueller2003

@MartinMueller2003 I installed the CI build ESPixelStick v4.0-ci3138152858 (Sep 27 2022 - 18:51:29)

Since my ESP32 doesn't appear to have PSRAM, I used the "D1 Mini Mhetesp32minikit" build in ESPixelStick Flash Tool. I chose that build after searching through other Issues/Discussions. If there is a better choice or if I should compile myself I will. I was able to at least get the ESP to complete a boot but the attached log will show that there is a periodically logged error because of a missing wired ethernet port.

SerialOutput-10192022.txt

I haven't done any other testing yet.

The serial log output is just this periodic error: "esp_eth: esp_eth_ioctl(348): ethernet driver handle can't be null"

I'll configure this ESP32 as an FPP Remote to test if I don't need to switch builds or compile the firmware to more closely match my ESP32.

This is the ESP32 that was purchased: https://www.amazon.com/dp/B09C5RDZ8G

akennerly avatar Oct 19 '22 20:10 akennerly

The correct image would be for the d1_mini32 board

MartinMueller2003 avatar Oct 19 '22 20:10 MartinMueller2003

@MartinMueller2003 That build results in a reset loop.

SerialOutput-10202022.txt

akennerly avatar Oct 20 '22 06:10 akennerly

Hmm. I will take a look later today

MartinMueller2003 avatar Oct 20 '22 10:10 MartinMueller2003

@MartinMueller2003 Any luck in seeing where the issue might be?

I realize you have your own life and this is unpaid volunteer development. I'd just like to get at least one ESPPS stable for the holidays.

Thanks

akennerly avatar Oct 27 '22 15:10 akennerly