1D->2D expansions can lead to crashes
What happened?
While debugging the PS code I found it crashes quite often when using 1D FX in a 2D segment. My investigation showed, that this also happens with non PS effects. The larger the segment, the easier it is to trigger the crash.
To Reproduce Bug
I was able to reproduce this on S3 and ESP32: set up a 2D matrix, 32x32 or larger, on 64x32 it happens more often. I used two outputs in my tests both WS281x with equal amounts of LEDs (512 for example). I set a bootup preset with Aurora effect, then switch to Android effect (just to see if it crashes as I tested with no LEDs connected). Crash dump below was with Aurora. Change the 1D->2D expansion type randomly, at 32x32 it usually takes 10-20 changes, on larger ones it happenes sometimes after just one or two changes.
Expected Behavior
Install Method
Self-Compiled
What version of WLED?
Latest main.
Which microcontroller/board are you seeing the problem on?
ESP32
Relevant log/trace output
Unfortunately, trace outputs are very erratic and inconclusive. What often pops up is something like this:
JSON buffer released. (12)
-- Stopping transition: S=0x3ffd87bc T(0x3fff15f0) O[0x3fff098c]
-- Destroying segment: 0x3fff098c [0,32:0,32] 352->(0x3fff16fc) T[0]
JSON buffer locked. (11)
Waited for strip to finish servicing.
Waited for strip to finish servicing.
-- Started transition: S=0x3ffd87bc T(0x3fff15f0) O[0x3fff098c] OP[0x3fff1a68]
JSON buffer released. (11)
JSON buffer locked. (11)
Waited for strip to finish servicing.
JSON buffer released. (11)
Guru Meditation Error: Core 1 panic'ed (LoadProhibited). Exception was unhandled.
Core 1 register dump:
PC : 0x40082d99 PS : 0x00050032 A0 : 0x40087b94 A1 : 0x3ffbf870
=> 0x40082d99: i2s_ll_get_intr_status at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/hal/esp32/include/hal/i2s_ll.h:398
(inlined by) i2s_intr_handler_default at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/driver/i2s.c:434
=> 0x40087b94: _xt_medint2 at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/port/xtensa/xtensa_vectors.S:1203
A2 : 0x3ffd9910 A3 : 0x00020202 A4 : 0x00020202 A5 : 0x4008f122
=> 0x4008f122: _frxt_int_enter at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/port/xtensa/portasm.S:119
A6 : 0x00000001 A7 : 0x3fe925a6 A8 : 0x00000001 A9 : 0x3fe92485
A10 : 0x7ff00000 A11 : 0x00000123 A12 : 0x00000020 A13 : 0x00000001
A14 : 0x00000000 A15 : 0x80000002 SAR : 0x00000009 EXCCAUSE: 0x0000001c
EXCVADDR: 0x00020212 LBEG : 0x400029ac LEND : 0x400029cb LCOUNT : 0x00000000
=> 0x400029ac: ?? ??:0
=> 0x400029cb: ?? ??:0
Backtrace: 0x40082d96:0x3ffbf870 0x40087b91:0x3ffbf8a0 0x40081628:0x3ffcc350 0x400e84df:0x3ffcc410 0x400ed1fc:0x3ffcc450 0x4011d855:0x3ffcc480 0x4011dd2e:0x3ffcc530 0x4013a7e5:0x3ffcc550
=> 0x40082d96: i2s_ll_get_intr_status at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/hal/esp32/include/hal/i2s_ll.h:398
(inlined by) i2s_intr_handler_default at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/driver/i2s.c:434
=> 0x40087b91: _xt_medint2 at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/port/xtensa/xtensa_vectors.S:1203
=> 0x40081628: Segment::setPixelColor(int, unsigned int) const at wled00/FX_fcn.cpp:732 (discriminator 2)
=> 0x400e84df: Segment::setPixelColor(int, unsigned char, unsigned char, unsigned char, unsigned char) const at wled00/FX.h:691 (discriminator 2)
(inlined by) mode_aurora() at wled00/FX.cpp:4823 (discriminator 2)
=> 0x400ed1fc: WS2812FX::service() at wled00/FX_fcn.cpp:1235
=> 0x4011d855: WLED::loop() at wled00/wled.cpp:127 (discriminator 5)
=> 0x4011dd2e: loop() at wled00/wled_main.cpp:23
Anything else?
not every crashdump shows this but often xtensa_vectors.S is somewhere in there.
I suspect these things:
- buffer overrun somewhere in the layering code
- misaligned memory access: if DSP functions are used (not sure vectors is one) then memory boarder alignment applies. For example: the DSP FFT needs 16-byte aligned memory (see my PR with DSP FFT code )
let me know if this can be reproduced.
Code of Conduct
- [x] I agree to follow this project's Code of Conduct
Segment::setPixelColor(int, unsigned int) const at wled00/FX_fcn.cpp:732 (discriminator 2) indicates something unexpected happens during 1D expansion and raw pixel drawing. Prior lines say that the issue comes from I2S (interrupt?) handling.
You will need to provide exact steps.
I would love to have more detailed steps to provoke this, unfortunately it happens at random. If it is due to a race condition, thats what I would expect.
I may have found the issue: its related to _frametime.
the funcion waitForIt() waits for the targeted frametime, if that is shorter than what the actual frame rendering time is, the segment is not suspended when changing dimensions and the FX can still be rendering, causing the crash.
In the crashdump above "Waited for strip to finish servicing." shows up right before the crash. The wording is a bit misleading as this is printed if a timeout happened during wait.
to solve this, waitForIt() must not return before isServicing() returns false if afterwards buffers are changed.
Is there a call to waitForIt() where an early return does not cause problems if its still servicing?
If not, could the timeout be set to like 1000ms or will that cause other problems?
I don't think there are any cases for waitForIt() which won't risk crashes if they time out -- hence the suggestion to replace it with a formal mutex. Broadly, though, blocking of the web servicing task for a long time can be problematic. (Though we're stepping in to the "many problems have to happen at once" case there.)
Either way, we probably do need a longer timeout; and should it actually timeout, we probably want to abort most of the operations calling in instead of blindly continuing.
IIRC there are a couple of other places that also risk contention. The legacy web API handleSet() is one, it could technically be interacting with or modifying segments while a preset is being applied by the main task. This applies to both web and MQTT, I think.
Sorry, I'd stumbled on this issue a while back, never got around to doing a full analysis to make sure there weren't any other points of failure. In practice, most of the potential concurrency problems are actually solved by the JSON lock, which implicitly functions as a mutex around most system state. My original thought was to consider extending it to the render loop and then hunt down any stragglers. That said, the current approach of doing an early exit of the render loop when a write is necessary is a good idea and should be preserved.
One thing to keep in mind is that it's also not stricly speaking safe to even read segment properties without holding the lock; it's possible some other task was in the middle of updating the segments.
The premature exit from wait loop was added intentionally to avoid issues stalling web requests. Even though I knew the implications.
Mutex will not solve this. However, there are two possible workarounds:
- wait "indefinitely" until frame drawing is done (@dedehai you can increase wait by adding 2* or 3* to get frameTime())
- schedule segment parameter update out of drawing loop (
service())
In the past we already had "scheduled" updates for segment length (it only handled length). However such approach will need a lot of RAM if we want to hamdle all 32 segments in full. 8266 is at risk here.
As for first option, the problem can also be solved to drop requested FPS if ESP can't keep up (dynamically updating FPS).
I added a temporary fix in https://github.com/wled/WLED/pull/4783 but it does only mask the underlaying issue.
@willmmiles since "blocking of the web servicing task for a long time can be problematic": does the current code prevent me from setting target FPS to 2, making the timeout 500ms?
Mutex will not solve this. However, there are two possible workarounds:
- wait "indefinitely" until frame drawing is done (@DedeHai you can increase wait by adding 2* or 3* to get frameTime())
That's what a mutex would permit. The only difference is that it blocks "properly" in the RTOS instead of a using a delay() loop -- and would also extend this protection to any other tasks, should some future integration end up doing work on yet another task (such as an MQTT or a bluetooth integration).
- schedule segment parameter update out of drawing loop (service())
In the past we already had "scheduled" updates for segment length (it only handled length). However such approach will need a lot of RAM if we want to hamdle all 32 segments in full. 8266 is at risk here.
The RAM cost for just handling segment updates wouldn't be terrible; it'd be a buffer of SegmentCopy objects, just like BusConfigs before it, and so on; but I think it would be silly to add yet another special-case deferral of update handling. (Also technically we'd need to make sure that update queue is itself mutexed/task safe.) We've already had to do this for bus configuration, for wifi settings, the list just keeps going on. The specific issue here with Segments is also potentially a problem for pretty much every setting, every usermod option, you name it -- the assumption that the web server task can just read or write variables being used by the main task is extremely dangerous. (You can get away with a lot playing fast and loose with this kind of thing in a cooperative tasking model like the ESP8266 -- there, you only task switch at yield()s -- but it's not so simple with a preemptive OS like ESP32 uses.)
Instead I think we should take to one of two approaches:
- Commit to processing all config updates on the main/render task, so no locking is required. Functionally this could be implemented by having the web server (or other configuration calling tasks) queue the deserialization functions to be handled by the main thread using a thread-safe queue. Pros: implementation is straightforward, works for all configuration items everywhere; con: worst possible performance - main loop is blocked while updates are analyzed, even if they make no changes.
- Commit to using granular locking during configuration updates. Pro: config functions can be called freely from any task, locking scopes are narrow, minimal impact on render loop if no changes are made; Con: every feature, component, and usermod needs to think about thread safety independently for their configuration.
I usually lean toward the latter, but I could be talked in to the former if there's a consensus that "I don't want to have to think about task safety" is more valuable than "best possible performance". (If we do want to go that route, possibly in the future, the render loop could be broken out to a separate task from the rest of the main thread, and the bus/segment handling special cased with locking.)
@willmmiles since "blocking of the web servicing task for a long time can be problematic": does the current code prevent me from setting target FPS to 2, making the timeout 500ms?
It's not so much about the timeout value as how long the render actually takes. The FPS /target/ shouldn't matter, since the timeout only applies to how long it spends inside the render, not how long it waits between renders. The actual FPS might, though: if it's running so slow that it really takes 500ms to render a frame, you're gonna have a bad time, web server or no.
We might be able to do better for the web server by pipelining the output phase, eg. making sure the Bus::show() calls are outside of the segment mutex once the blending has been completed.
Hey! This issue has been open for quite some time without any new comments now. It will be closed automatically in a week if no further activity occurs. Thank you for using WLED! ✨
do we want to keep this open as a reminder that timing still needs proper fixing? The current workaround seems to hold up quite well, I never got any crashes anymore.
FYI: https://github.com/blazoncek/WLED/commit/961ee299ccbbbc0a523ac748b0d68dba73c8ef8f
Hi all,
I did not read through all your discussions, so excuse me in case I'm repeating stuff that was already said. Still I can add my 2€ of experience from WLED-MM.
In fact, I think that only a mutex / semaphore will really solve the problem. I'm talking about FreeRTOS semaphores, not any hand-crafted spinlocks that are basically implemented with a volatile bool, because these user-space spinlocks are not sufficient to really exclude concurrent access on a multiprocessor CPU with caches.
The general issue is that UI callsbacks are running in web server context (another task) with high priority, while the "looptask" is rendering effects, sending out LEDs, saving presets, reading buttons, serving DMX etc.
In MM we found a workaround (not perfect, but worked) by blocking all "serialize" and "deSerialize" functions until strip.service() is done. I think upstream WLED did it similarly, with an additional mechanism to queue "segment" updates so they will be added to the segments list later by the main looptask.
I agree with @DedeHai @willmmiles that a Mutex to protect access to "Segments" would be the proper solution. With this, we could be sure that whenever one task reads or modifies segments, any concurrent tasks will be locked out.
PS: 8266 does not have OS based Mutex support, so not sure what to do about this aging device ... maybe let 8266 rest in peace 😉 and solve it for esp32?
@softhack007 Alas there is more to the thread. To sum up: if only it were just "Segments"! The underlying issue of concurrent access to shared state is endemic throughout the codebase. Bus drivers are another big one, but anywhere there's a String being stored is potentially vulnerable. (MQTT? IR? The list goes on...) It's harder to crash the system with non-heap types, but there are still ways, too...
We can try to mutex every individual subsystem, but that's a big task. Short term, the best "fixes 'em all" answer we have is to move the web server endpoints that interact with system state to the loop task. Fortunately this is straightforward: #4808 showcases the basic technique, though I haven't yet had time to ensure it's applied to every call that needs it, and there's some interaction with the shared JSON buffer lock that will need to be cleaned up. (Of course this works just fine for ESP8266, too.)
Longer term, I do think it's worth considering moving the LED rendering loop to an isolated task anyways, if only to better take advantage of multicore systems. At that point we will need to update it with a formal API and concurrency management (possibly with mutexes, but message passing might be a better fit). Either way, I don't anticipate any particular trouble fitting it in for the EPS8266; the cooperative multitasking model means we can stub out any mutexes on that platform.