WLED ESP8266 audioreactive UDP sync ported from MoonModules/WLED

Tries to resolve #3960

However, it seems to be unstable on RGB(W) PWM strips at higher clock speeds, causing frequent reboots

May 08 '24 13:05 gaaat98

As I might not have an ESP8266 to hand when doing the code review, if you could please include a screenshot of the Audio Reactive settings, including the sync dropdown being expanded please

May 08 '24 19:05 netmindz

I have left the default ones

immagine

May 08 '24 20:05 gaaat98

Screenshot_20240509-133320_Samsung Internet

@blazoncek question as I'm not very experienced with 8266 - sync receive polls for new UDP multicast packets each 20ms. This works well on esp32.

Maybe 8266 can have a problem with it? Do you think that adding 'yield()' before/after UDP calls might help?

Or maybe it's the frequent alloc/dealloc of fftBuff[packetsize] that causes stress on 8266 due to heap fragmentation? In fact a static buffer of 88 bytes (max xpected packet size) would be enough.

May 09 '24 11:05 softhack007

I think @willmmiles or @pbolduc will know much more about AsyncUDP/TCP than me.

fftBuff is allocated on stack as far as I can see. That shouldn't be a problem. @gaaat98 please provide crash dump using exception decoder with your WDT crash.

May 09 '24 12:05 blazoncek

Excuse the dumb question, but how do I enable stack trace printing? The only thing I see on serial while resets occur is this:

Loop delayed more than 3ms.
Loop delayed more than 3ms.

 ets Jan  8 2013,rst cause:4, boot mode:(3,7)

wdt reset
load 0x4010f000, len 3424, room 16
tail 0
chksum 0x2e
load 0x3fff20b8, len 40, room 8
tail 0
chksum 0x2b
csum 0x2b
v000e0460
~ld
���'�{��n<�$�l b�|{�l�o��o�l ��{�d�l

�
---WLED 0.15.0-b3 2405030 INIT---
esp8266 @ 80MHz.
Core: 3.1.2
FLASH: 2 MB
heap 29072
PIN ALLOC: Pin 1 successfully allocated by 0x89 (137)
Registering usermods ...
heap 28984

May 09 '24 14:05 gaaat98

-D WLED_DEBUG

(which you have) and

monitor_filters = esp8266_exception_decoder

in your PIO environment.

But perhaps 8266 does not support WDT exceptions and does not print stack dump. It looks to me that it may be possible that there is some loop that iterates across too many iterations (like -1 becoming 65535 when doing int16_t conversion to uint16_t).

May 09 '24 16:05 blazoncek

One of the biggest issues is the Espressif networking stack has problems with high UDP traffic rates. It has been a while since I looked at the networking. I found this issue https://github.com/espressif/arduino-esp32/issues/4104 which talks about errors with the parsePacket function. I need a reminder where the WifiUDP library comes from in WLED.

May 10 '24 16:05 pbolduc

I need a reminder where the WifiUDP library comes from in WLED.

WifiUDP is part of the Arduino framework on ESP8266 and ESP32. (The E131 receiver elsewhere in the project is using the ESPAsyncUDP library, which is not part of the framework.)

My quick read of the WifiUDP framework code on ESP8266 suggests it's reasonably well written. By contrast, the ESP32 version is awful, heap allocating and freeing full 1460 byte packet buffer every time it polls to see if there's a new packet(!!).

I pulled this PR in to my web server stability working branch and I haven't been able to reproduce the fault yet. It's neat to see the AR effects working on an 8266 again, though it's noticably laggy compared to the other ESP8266 running as a DDP segment beside it.

@gaaat98 would you mind posting your config, in case there's something non-obvious I've got configured differently?

May 10 '24 23:05 willmmiles

I pulled this PR in to my web server stability working branch and I haven't been able to reproduce the fault yet

@willmmiles I think the problem is is happening with analog (PWM) strips only - may also happening with bitbang ws2812 drivers. So I suspects it's a timing issue, or simply the 8266 is overloaded when doing high freq PWM (interrupt driven) + UDP polling + WiFi + other stuff.

Besides UDP polling, audioreactive runs a linear filter (no loops) then simply copies received data to the variables used in effects. So audioreactive on 8266 does "almost nothing" in addition to UDP.

May 11 '24 08:05 softhack007

@willmmiles Here's the configuration on my athom controller (esp8285), on this board the crashes happen after a few updates when the clock speed is set to values higher than 'Slowest' (the effect I tested was DJ Light but I assume also the others will do the same).

This is the config of a ESP8266 board I used for testing, here the crashes are a bit less frequent but still happen. I noticed that if I open the /liveview page on a browser the number of reboots noticeably increases, which suggests that the boards crash due to overload.

These are both configured as PWM strips, I also have another athom controller connected to SK6812 strip with UDP sync enabled that shows no such issues

May 11 '24 08:05 gaaat98

@gaaat98 please get a plain D1 mini board, configure it as an Athom controller and monitor serial output. It is important if the crashes are caused by watchdog or something else.

May 11 '24 12:05 blazoncek

@blazoncek That's basically what I did to obtain this log, the only difference is that I used a NodeMCU ESP8266 instead of D1 Mini, is that relevant?

Loop delayed more than 3ms.
Loop delayed more than 3ms.

 ets Jan  8 2013,rst cause:4, boot mode:(3,7)

wdt reset
load 0x4010f000, len 3424, room 16
tail 0
chksum 0x2e
load 0x3fff20b8, len 40, room 8
tail 0
chksum 0x2b
csum 0x2b
v000e0460
~ld
���'�{��n<�$�l b�|{�l�o��o�l ��{�d�l

�
---WLED 0.15.0-b3 2405030 INIT---
esp8266 @ 80MHz.
Core: 3.1.2
FLASH: 2 MB
heap 29072
PIN ALLOC: Pin 1 successfully allocated by 0x89 (137)
Registering usermods ...
heap 28984

Before this Serial only showed other Loop delayed more than 3ms. and other generic debug info that was printed periodically.

The configuration used is the one I uploaded earlier:

This is the config of a ESP8266 board I used for testing, here the crashes are a bit less frequent but still happen. I noticed that if I open the /liveview page on a browser the number of reboots noticeably increases, which suggests that the boards crash due to overload.

I repeated that once again and collected other logs here: crash-log.txt, clock speed is set to 'Fastest', WLED crashes after a while if I set to effect DJ Light from the Android app (the NodeMCU is receiving sound data from an esp32 which is listening to music), or crashes immediately if I enable the Peek option from the app.

EDIT: I was flashing the 2MB flash version because that's the one used on the athom controllers, so I also tried the standard NodeMCU build and the issue is still present

May 11 '24 13:05 gaaat98

I pulled this PR in to my web server stability working branch and I haven't been able to reproduce the fault yet

@willmmiles I think the problem is is happening with analog (PWM) strips only - may also happening with bitbang ws2812 drivers. So I suspects it's a timing issue, or simply the 8266 is overloaded when doing high freq PWM (interrupt driven) + UDP polling + WiFi + other stuff.

I tried both bitbang ws2812 (actually my default ESP8266 test, due to a poor pin selection in a project I built before I discovered WLED!) and RGBW PWM in "normal" speed. No crashes for me after 12h of operation.

Besides UDP polling, audioreactive runs a linear filter (no loops) then simply copies received data to the variables used in effects. So audioreactive on 8266 does "almost nothing" in addition to UDP.

I suspect the polling latency might be the source of the visible lag. On 8266 calling the poll looks to be "basically free" in terms of CPU usage if there's nothing pending; we could probably poll every loop with no impact there (not so much on ESP32, though!!). Stability is more important than performance, though, so I'll investigate this later.

May 11 '24 13:05 willmmiles

clock speed is set to 'Fastest', WLED crashes after a while if I set to effect DJ Light from the Android app

Aha, there it is! Just got one with these settings.

May 11 '24 13:05 willmmiles

@willmmiles during your test with the PWM strip did you select a sound reactive effect on the receiving controller? Did you have another device sending the audio data? To have the sound reactive effects you need to make sure you pulled PR#3942 otherwise they are disabled.

Sorry to ask but it's stange since I have almost immediate crashes

May 11 '24 13:05 gaaat98

@willmmiles during your test with the PWM strip did you select a sound reactive effect? Did you have another device sending the audio data? To have the sound reactive effects you need to make sure you pulled https://github.com/Aircoookie/WLED/pull/3942 otherwise they are disabled.

Yes and yes. For the initial testing I had the same effect (GravCentric) running on three WS2812 strips - one on the ESP32 AR node directly, one on an ESP8266 over DDP, and one on an ESP8266 using AR sync. That's how I could tell the AR sync was laggy compared to DDP. ;)

For the overnight test I reconfigured that last unit to add PWM output on some unconnected pins. It ran OK for me using the GravCentric effect at 'Normal' speed. Dunno if it was the speed, the DJ light effect, or peek, but it faulted right away when I set it up with that config ... though it hasn't faulted since.

May 11 '24 13:05 willmmiles

---WLED 0.15.0-b3 2405030 INIT--- esp8266 @ 80MHz.

@gaaat98 try 160MHz builds.

May 11 '24 16:05 blazoncek

Same behaviour, I'm trying to establish reliable steps to reproduce the crashes but at the moment it seems that for me is enough to have the ESP8266 receiving data, the Dj Lights effect active, clock set to Fastest, and then just wait for maybe less than a minute while making some noises near the sending device...

Logs for the 160MHz build here

May 11 '24 17:05 gaaat98

I managed to get the crash to reliably reproduce, but I haven't yet pinpointed the root cause. Here's what I've learned so far - might be old hat for some of you, sorry I'm still new to this environment:

I was able to reproduce the crash using non-AR FX ('Breathe' in particular) on the PWM segment, so long as the brightness got above a certain point
It won't reproduce without the UDP traffic, though
-D DEBUG_ESP_HWDT gives a more helpful crash dump on WDT timer crashes such as these, but the user and system stacks are all munged together
-D DEBUG_ESP_HWDT_NOEXTRA4K gives clean stack traces, but costs 4K of RAM
Adding DEBUG_PRINT(F("Reset for: ")); DEBUG_PRINTLN(ESP.getResetInfo()); to WLED::setup() gives yet more clues to what was going on when the system faulted
None of these have left me with a smoking gun, or at least I haven't isolated it yet - the faulting instruction pointers seem to be all over the system and frequently "benign" (eg. register arithmetic, nothing that would obviously lead to a fault).
.. though there's a frequent occurrence of a ROM function at the top of the system stack, ram_rxiq_get_mis, which I am guessing is for retrieving packets from the wifi module??

I'm on vacation and away from home for the next couple of weeks, so I won't be able to do any more digging until I get back, but I wanted to leave some tracks in case anyone else was investigating.

Sorry I don't have more answers!

May 16 '24 03:05 willmmiles

please condisder the following implementation of the receiveAudioData function:

The buffer is static now and its size is 83 bytes (size of the v1 packet)
Fixed possible stack crash attack if somebody would send big UDP packets
Added a while loop to make sure that if packets are sent too often it wouldn't cause out of ram crash.

    bool receiveAudioData()   // check & process new data. return TRUE in case that new audio data was received. 
    {
      if (!udpSyncConnected) return false;
      bool haveFreshData = false;

      size_t packetSize = 0u;
      while ( (packetSize = fftUdp.parsePacket()) !=0 )
      {
        if ( (packetSize == sizeof(audioSyncPacket)) || (packetSize == sizeof(audioSyncPacket_v1)) ) {
          //DEBUGSR_PRINTLN("Received UDP Sync Packet");
          static uint8_t fftBuff[sizeof(audioSyncPacket_v1)];
          fftUdp.read(fftBuff, packetSize);

          // VERIFY THAT THIS IS A COMPATIBLE PACKET
          if ( isValidUdpSyncVersion((const char *)fftBuff)) {
            decodeAudioData(packetSize, fftBuff);
            //DEBUGSR_PRINTLN("Finished parsing UDP Sync Packet v2");
            haveFreshData = true;
            receivedFormat = 2;
          }
          else if (isValidUdpSyncVersion_v1((const char *)fftBuff)) {
            decodeAudioData_v1(packetSize, fftBuff);
            //DEBUGSR_PRINTLN("Finished parsing UDP Sync Packet v1");
            haveFreshData = true;
            receivedFormat = 1;
          }
          else receivedFormat = 0; // unknown format
        }
      }
      return haveFreshData;
    }

May 21 '24 06:05 flux242

I managed to get the crash to reliably reproduce, but I haven't yet pinpointed the root cause. Here's what I've learned so far - might be old hat for some of you

🤔 so the only "smoking guns" we have so far:

it's a watchdog reset -> so something took too long time or the wifi driver did not get it's regular time slice
it happens with PWM leds -> pwm on 8266 created a load of interrupts, because 8266 does not have the "ledc" unit like esp32 -> 8266 basicially uses a timer that fires interrupts whenever the output needs to toggle (up to ~50 times per millisecond per pin). PWM toggles most frequently when the "analog" value is around 50% of max.
there is no real "pattern" as to where it crashes -> does not look like a single loop taking too much time, maybe it's just that the watchdog finally bites when too much time has accumulated.

@willmmiles @gaaat98 can you try if adding a "yield()" before or after the wifiudp calls helps? I.e. somewhere here:

https://github.com/gaaat98/WLED/blob/cec72419863a0ce8f19a02aa05bf9073b1302dcd/usermods/audioreactive/audio_reactive.h#L1086-L1090

receiving happens in "arduino loop" context so yield() is allowed.

May 26 '24 17:05 softhack007

please condisder the following implementation of the receiveAudioData function:

Thanks for sharing your ideas. Maybe before we look into memory management, let me explain a bit how the audio sync feature works:

the sender creates a UDP multicast channel on port xyz. Multicast = one-to-many, but not the same as broadcast (one-to-all). UDP Multicast is a specific protocol that is usually not routed out of your local network.
the sender sends one packet of 44 bytes each 20 millis
only one sender should exist in your network for port xyz!
many receivers attach to the multicast channel with port xyz.
receivers poll for new packets each 10 millis (normally) so any intermediate "backlog" - if that exists - would be consumed quickly
timing is important. If the receiver would unpack too many packets at once, this would result in "lagging" or "jumping" of effects
we are not sharing raw audio data, but audio processing results that's why 44 bytes each 20ms is enough

The buffer is static now and its size is 83 bytes (size of the v1 packet)

Sounds reasonable, just that the buffer does not need to be static - fixed size would be enough - that will ensure the array is created in the stack, while variable sized arrays are allocated on the heap

Fixed possible stack crash attack if somebody would send big UDP packets

Honestly, there are lots of easy ways to attack and crash an 8266 running WLED. I think we can concentrate on reasonable cases of normal operation. This includes "only one sender" and "no rogue packet floods".

Added a while loop to make sure that if packets are sent too often it wouldn't cause out of ram crash.

I think we have a timing problem (see my previous post), not an issue with available heap or heap fragmentation. @willmmiles I'm not sure if wifiUdp will "preserve" unread multicast packages? Or do we always get the latest one from the wifi driver and old ones are discarded? Maybe on 8266 we need to manually "flush" the receive buffer after each parsepacket() > 0? Similar to what udp.flush() does on esp32... for example by repeating wifiudp::read() until it returns "zero bytes read".

May 26 '24 17:05 softhack007

@willmmiles @gaaat98 can you try if adding a "yield()" before or after the wifiudp calls helps? I.e. somewhere here:

https://github.com/gaaat98/WLED/blob/cec72419863a0ce8f19a02aa05bf9073b1302dcd/usermods/audioreactive/audio_reactive.h#L1086-L1090

I have added yield() after size_t packetSize = fftUdp.parsePacket(); and also after fftUdp.read(fftBuff, packetSize); unfortunately the controller still crashes shortly after setting a sound reactive effect.

May 28 '24 17:05 gaaat98

Short update: I am still working on this! I added some code to track interrupts and other events across HWDT crashes; unfortunately I'm still not seeing any consistency in the traces from one crash to the next.

I'm really wishing for that JTAG debugger, but I don't think I can get that working without rebuilding the whole toolchain. :(

Jun 15 '24 20:06 willmmiles

@willmmiles I'm not sure if wifiUdp will "preserve" unread multicast packages? Or do we always get the latest one from the wifi driver and old ones are discarded? Maybe on 8266 we need to manually "flush" the receive buffer after each parsepacket() > 0? Similar to what udp.flush() does on esp32... for example by repeating wifiudp::read() until it returns "zero bytes read".

I removed the time check limiter in the poll function to see what happens. There's definitely some evidence of malfeasance. The udp packets seem to be released to us in bursts of five-- so there's definitely queuing going on somewhere -- and unfortunately they are not delivered very frequently.

Jun 15 '24 21:06 willmmiles

The udp packets seem to be released to us in bursts of five-- so there's definitely queuing going on somewhere -- and unfortunately they are not delivered very frequently.

Ugh, it's worse than that. I turned on the arduino core debugging, they're being delivered by lwip to WifiUDP in batches. It will queue at max 5, but sometimes it's dropping two or three. Our loop() function runs to completion many times in between these bursts, so we aren't to blame for hogging the CPU. The interrupt traces show wifi interrupts also seem to be somewhat bursty. I'll see if I can figure out how to tickle the network stack to get it to deliver things more smoothly.

Jun 15 '24 21:06 willmmiles

The udp packets seem to be released to us in bursts of five-- so there's definitely queuing going on somewhere -- and unfortunately they are not delivered very frequently.

Ugh, it's worse than that. I turned on the arduino core debugging, they're being delivered by lwip to WifiUDP in batches. It will queue at max 5, but sometimes it's dropping two or three. Our loop() function runs to completion many times in between these bursts, so we aren't to blame for hogging the CPU. The interrupt traces show wifi interrupts also seem to be somewhat bursty. I'll see if I can figure out how to tickle the network stack to get it to deliver things more smoothly.

OK, this turned out to be a wrinkle in multicast handling on wifi networks: the buffering was being done by my AP, not the ESP. Apparently this is what happens if you allow the "modem sleep", so for any AR sync users, it probably needs to be turned off. I don't know if we should maybe find some way to indicate that in the UI somewhere, or have the AR usermod page also set that setting. Possibly we could also detect when this is happening (ie. when we receive two packets in one poll) and provide some indication in the info panel.

Unfortunately it gets worse, though. For multicast packets, if any node on the SSID enables wifi power saving (aka modem sleep), the wifi standard mandates that all multicast packets must be buffered and batched, so they don't get missed by the one sleeping node. The only workaround for this is to ask your AP to perform "multicast-to-unicast" conversion, if it supports it. (Fortunately mine does!)

Chasing that down has been a great learning experience - I've learned a lot about the internals of xtos, wifi drivers, and the wifi standard - but alas the PWM crashes don't seem to be directly related and still happen even with correctly timed packet reception. Onward!

Jun 18 '24 01:06 willmmiles

there's definitely queuing going on somewhere -- and unfortunately they are not delivered very frequently.

Unfortunately it gets worse, though. For multicast packets, if any node on the SSID enables wifi power saving (aka modem sleep), the wifi standard mandates that all multicast packets must be buffered and batched, so they don't get missed by the one sleeping node. The only workaround for this is to ask your AP to perform "multicast-to-unicast" conversion, if it supports it. (Fortunately mine does!)

@willmmiles wow that's unexpected - thanks for the good research 😃 It actually matches with another experience - we have this old note in our sound sync wiki about "make sure you have gone into the WiFi Preferences and clicked on 'Disable WiFi sleep' at the bottom of the page. It has caused us innumerable problems in the past". At least we finally know why that happens.

I'll try to find some time in the next week so I can join debugging 8266. My bets are still on timing issues, and maybe we need to carefully place a few yield() so that the wifi stack is not dying.

Jun 18 '24 12:06 softhack007

I am running 015 on shelly rgbw2, and and having reboots also, currently debugging. I am using nothing special either, just fresh config, defaults. I tried testing all interfaces off, still sporadic reboots.

Loop time[ms]: 0/9
UM time[ms]: 0/1
Strip time[ms]: 0/7
Segments: 1 -> 68/0B
  Seg: 1,1 [A=1, 2D=0, RGB=1, W=0, CCT=0]
Modes: 4*187=748B
Data: 4*187=748B
Map: 2*0=0B

 ets Jan  8 2013,rst cause:4, boot mode:(3,6)

wdt reset
load 0x4010f000, len 3424, room 16 
tail 0
chksum 0x2e
load 0x3fff20b8, len 40, room 8 
tail 0
chksum 0x2b
csum 0x2b
v000cec40
~ld
âìNì{ûo|ìll bÛß|sloàoâ

Jun 19 '24 18:06 tablatronix

I am running 015 on shelly rgbw2, and and having reboots also

@tablatronix Are your shelly bulbs "PWM" (analogue), or ws2812b? If PWM - how many color channels?

Jun 19 '24 18:06 softhack007