Adafruit_NeoPixel icon indicating copy to clipboard operation
Adafruit_NeoPixel copied to clipboard

pixel.show() crash with more than 73 pixel on ESP32s3

Open tresler opened this issue 2 years ago • 36 comments

I have weird problem. I test example code or example with new operator and everything works fine if I have 73 or less RGB leds. If I use 74 or more, everything crash. Can anyone explain it?

Debug code: adafruit_neopixel_crash.txt

  • Arduino board: ESP32s3-WROOM
  • Arduino-ESP32: 3.0.0 alpha 3
  • Arduino IDE: 2.2.1
  • Adafruit_Neopoxel: 1.12

tresler avatar Dec 17 '23 22:12 tresler

Same problem here, also Esp32 (Heltec Wifi Kit), and at around 70 pixels

ajkeeton avatar Jun 05 '24 02:06 ajkeeton

I'm experiencing the same issue as well (crash/boot loop when driving more than around 70 pixels). This occurs on both my XIAO ESP32C3 boards and my ESP-WROOM-32 Dev Modules, with the following hardware/software:

  • Boards: XIAO ESP32C3, Generic ESP-WROOM-32 38 Pin Dev Board
  • Board Package: esp32 by Espressif (both v3.0.0 & v3.0.1)
  • Adafruit NeoPixel v1.12.2
  • Arduino IDE: 2.3.2

Digging around in the Adafruit NeoPixel Library, using the v3.x board package means were also using IDF v5, which is pretty simple, and the NeoPixel Library doesn't appear to be to blame for this issue. I've isolated the crash to where it's calling: rmtWrite(pin, led_data, numBytes * 8, RMT_WAIT_FOR_EVER);. The crash happens somewhere in there. Digging a little deeper, it doesn't look like the problem is in the IDF library either, since it doesn't really do much with the requested buffer size except to pass it along to the rmt_transmit method of the SDK. I expect somewhere downstream there a buffer is overrun and it eventually overwrites something important causing the whole thing to crash and reboot. 70 pixels turns out to be a fair amount of data: 70 pixels * 3 bytes for RGB, then each byte requires 8 RMT Items which themselves take 4 bytes each - 6720 bytes! This is far more than the ESP32-S3's 2048 bytes of RMT block memory (or the ESP32-C3's meager 768 bytes), so the SDK must be handling the buffering itself.

I'm not sure at this point where to turn for a fix though, since it looks like this is a problem with Espressif's v3 SDK. I'd really like to get this fixed, as I have a few projects with 150 or more pixels that I currently can't complete with this setup. I'm sure there's plenty of other people as well that expect to be able to drive larger pixel strings, so it'd be nice to get this working for everyone on the ESP32 platform.

teknynja avatar Jun 07 '24 22:06 teknynja

So I posted about this issue on the Espressif forums (https://www.esp32.com/viewtopic.php?f=13&t=40270) and they quite helpfully pointed me in the right direction; it appears that the culprit in the NeoPixel library is the espShow() method in the esp.c file is allocating the RMT Items Buffer (led_data) on the stack, and at about 70ish pixels it exceeds the available stack space and causes the crash. I'm guessing that allocating that buffer from the heap (or statically) would likely greatly increase the number of pixels that can be driven.

While looking at this code more closely, I also noticed that it's calling rmtInit() every time the pixels are "shown" - I'm assuming that it probably should only be called once (maybe in the .begin() method?). There doesn't appear to be a close/cleanup function in the Adafruit_NeoPixel driver, so there's probably no place or need to call the matching rmtDeinit() to release the RMT channel.

teknynja avatar Jun 09 '24 02:06 teknynja

@teknynja , good thinking, but neither of the provided examples allocate the buffer on the stack. The first has it as a global in .data. The second has it allocated from the heap via operator new[]. So that's not it.

The single provided crash wasn't run through the symbolicator. See idf.py monitor and decoding at, e.g. https://docs.espressif.com/projects/esp-idf/en/stable/esp32/api-guides/tools/idf-monitor.html Many IDEs offer convenient ways to do this.

Note that the ESP32 parts have harsh rules about what registers can be poked at interrupt vs. base times and some other context rules. Is it possible that something is trying to refill a buffer and the hardware disappears from beneath it? That'll show that you'll be deep into a loop and the load/stores that have worked many, many times suddenly result in an invalid opcode (?) exception, thought the opcode hasn't changed.

Offhand, is this perhaps taking long enough that the watchdog is firing?

This caught my eye as I have a project that uses FastLED to actually feed the LEDs and it also blows up on s3, thought he breaking point is well into the many hundreds of LEDs. The breaking point doesn't seem to be absloute, but at X it'll run for hours and ad X+10 it'll crash within minutes. FastLED and NeoPixel are different code bases, so it's unlikely that they're bug-compatible.

The unfortunate memory footprints with either RMT or SPI are bad on long/many strands of WS2812's. It's annoying enough that I've started investigating slightly more expensive parts that have real DRAM - and lots of it - for larger pixel budgets. When 64 and 256MB parts are still under $10 - and predictably fast - I have to ask if the PSRAM pain is worth the pull.

robertlipe avatar Jun 09 '24 12:06 robertlipe

@robertlipe - I'm definitely not an expert C/C++ coder, but to my eye

void espShow(uint8_t pin, uint8_t *pixels, uint32_t numBytes, boolean is800KHz) {
  rmt_data_t led_data[numBytes * 8]; // <== stack allocation??

  if (!rmtInit(pin, RMT_TX_MODE, RMT_MEM_NUM_BLOCKS_1, 10000000)) {
    log_e("Failed to init RMT TX mode on pin %d", pin);
    return;
  }

  ...

the rather large led_data array is being allocated on the stack (plus in my demo code, once I change that to a static allocation outside the function, I'm able to send data to several hundred pixels without it crashing).

I haven't had a chance yet to try rigging up a test with allocating that buffer from heap in the actual driver, but I expect that should at least clear up the crashing issue.

teknynja avatar Jun 09 '24 12:06 teknynja

Ah. That code wasn't in the original post.

Yes, that code is being silly. That's very much going to blow up in a typical FreeRTOS (or most other multitasking) kind of environment.

It's absolutely worth the effort to have pixels[] correctly sized and allocated (or at least have led_data allocated and reallocated if it grows) so that this additional buffer isn't necessary. That's indeed the root of this bug. Good find.

Since you seem to have a test case in hand, you might be able to push this along with a (non-multithreaded) fix like changing

void espShow(uint8_t pin, uint8_t *pixels, uint32_t numBytes, boolean is800KHz) { rmt_data_t led_data[numBytes * 8]; // <== stack allocation??

to

void espShow(uint8_t pin, uint8_t *pixels, uint32_t numBytes, boolean is800KHz) {

static uint32_t led_buf_size{0}; static rmt_data_t;

if (numBytes * 8 > led_buf_size) { static led_data = malloc(numBytes * 8); assert(led_buf_size); led_buf_size numBytes * 8;

//rmt_data_t led_data[numBytes * 8]; // <== stack allocation??

This isn't really thread-safe, but that's probably OK for 99% of the callers since it only grows. It also saves the malloc on every draw(), keeping it but so that it only grows over time to reduce the possibility of fragementation, failure, or flicker if its slow. It'll quickly reach a stable state and hold onto that buffer.

Clean that up and propose it as a pull request if that basic approach works. I'm coding from bed (not feeling well) and not a debugger so there may be sloppiness in that.

A far, far better approach is to make the caller aware of led_buf_size and let IT allocate and manage that. That solves the threading problem. This is a case of trying to hide something that the caller actually DOES need to know about. Premature abstraction.

Good luck.

On Sun, Jun 9, 2024 at 7:37 AM teknynja @.***> wrote:

@robertlipe https://github.com/robertlipe - I'm definitely not an expert C/C++ coder, but to my eye

void espShow(uint8_t pin, uint8_t *pixels, uint32_t numBytes, boolean is800KHz) { rmt_data_t led_data[numBytes * 8]; // <== stack allocation??

if (!rmtInit(pin, RMT_TX_MODE, RMT_MEM_NUM_BLOCKS_1, 10000000)) { log_e("Failed to init RMT TX mode on pin %d", pin); return; }

...

the rather large led_data array is being allocated on the stack (plus in my demo code, once I change that to a static allocation outside the function, I'm able to send data to several hundred pixels without it crashing).

I haven't had a chance yet to try rigging up a test with allocating that buffer from heap in the actual driver, but I expect that should at least clear up the crashing issue.

— Reply to this email directly, view it on GitHub https://github.com/adafruit/Adafruit_NeoPixel/issues/375#issuecomment-2156516954, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCSD34T7VLRZYW6KK6Q73LZGREBPAVCNFSM6AAAAABAYU77RSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJWGUYTMOJVGQ . You are receiving this because you were mentioned.Message ID: @.***>

robertlipe avatar Jun 09 '24 12:06 robertlipe

I've submitted a pull-request with fixes for this issue. I'm able initialize the driver configured for 500 pixels and it seems to be working fine. Anyone who would like to try this code out can grab the changes at teknynja/Adafruit_NeoPixel/tree/esp32_rmt_memory_allocation_fix - let me know if it works for you or if you encounter any problems. Thanks again for the help from @robertlipe on getting me started on this patch.

teknynja avatar Jun 09 '24 20:06 teknynja

@teknynja Your code works in my case. Thank you and @robertlipe very much. Also the link seems to be dead. This link to teknynja's fixes should work: https://github.com/teknynja/Adafruit_NeoPixel/tree/esp32_rmt_memory_allocation_fix

Pr77Pr77 avatar Jun 12 '24 05:06 Pr77Pr77

For anyone using this library in multi-threaded scenarios, I've created a new branch with the same fix but it also includes a mutex to make it thread safe (works just fine for single-threaded apps as well): teknynja/Adafruit_NeoPixel/tree/esp32_rmt_memory_allocation_fix_safe. This is the preferred update for both single and multi-threaded applications.

teknynja avatar Jun 12 '24 16:06 teknynja

I like this a lot. There's a race. Condition at 51. Multiple threads could both enter and both find the muted null and bith create them and the operate with different lock instances in that first run. Pretty minor. Can it just be allocated in a larger scope where it'll only be allocated once?

But overall, I really like this. Thanks for doing this and sharing it.

On Wed, Jun 12, 2024, 12:00 PM teknynja @.***> wrote:

For anyone using this library in multi-threaded scenarios, I've created a new branch with the same fix but it also includes a mutex to make it thread safe (works just fine for single-threaded apps as well): teknynja/Adafruit_NeoPixel/tree/esp32_rmt_memory_allocation_fix_safe https://github.com/teknynja/Adafruit_NeoPixel/tree/esp32_rmt_memory_allocation_fix_safe. This is the preferred update for both single and multi-threaded applications.

— Reply to this email directly, view it on GitHub https://github.com/adafruit/Adafruit_NeoPixel/issues/375#issuecomment-2163513141, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCSD35RKD67MCB43FQXBY3ZHB5CJAVCNFSM6AAAAABAYU77RSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRTGUYTGMJUGE . You are receiving this because you were mentioned.Message ID: @.***>

robertlipe avatar Jun 12 '24 18:06 robertlipe

Yeah, i wasn't sure how to avoid that without having a way to initialize with the instance...

teknynja avatar Jun 12 '24 18:06 teknynja

Right, you'll have an allocated an unreachable object and never have a way to release it. That's not totally terrible. but we can do better. We need something that acts like a mutex, while we initialize our mutices. (mutexen? :-) mutexan? 🤠👢? ) This calls for what C++ nerds call a singleton. How can we enforce locking while bringing up our locks?

We use a std library function that does it behind our back!

static std::once_flag flag_; // Notice the flag isn't even really used; it just gives us an object we can put junk in.)

If you want a peek at how pros like Google do this, we can see the source they used before C++11 was common in their super-useful abseil library here (and while I dig abseil, it's 2024 so C++11 should be everywhere that matters by now, so we'll use the version in the std:: library instead of linking against Google's We're just using their source for education. (Yes, I admit I had to look it up myself...):

https://github.com/abseil/abseil-cpp/blob/master/absl/base/call_once.h

Better still, we can see how they use it in their own tests. (foo_test.cc tests foo in Google-land.)

https://github.com/abseil/abseil-cpp/blob/master/absl/base/call_once_test.cc

So I think line that code starts to look something more like:

    static std::once_flag flag;
    static std::mutex show_mutex;
    std::call_once(flag, []() { show_mutex = xSemaphoreCreateMutex();

});

Now xSCM will be calle dexactly once and show_mutex will be set exactly once.

Wait, what is this { stuff; } ?

That is a Lamba. It lets you create and use a function in place without actually giving it a name. We could make a set_mutex(void) { show_mutext = xCSM() }

... but now we have to expose show_mutex to it. We could do that by passing in an pointer, and then set_mutex(std::mutex* foop;) *foop = xCSM() which for such a tiny function isn't a big deal for a one-liner, but if there were some heft to that function, and it neede access to lots of internals, you can probably see this becoming a getter/setter nightmare. There are ways to pass arguments to lambda:s auto addr = [](int x) {return x+1;} and exotic ways to 'capture' more arguments and possibly defer the binding until the actual call. We don't NEED any of that, so I'll keep the syntax simple and give you a few keywords to satisfy your curiousity, if I've not already sent you into the woods.

Honestly, it's also pretty rarethat these would be statics in a Real Program, they'd be stored in some kind of a class holding espShow() and friends, but I'm trying to keep this light and not rearrange much, keeping with your original principles.

In case it helps, since I had to look up the syntax myself, here's the standalone test program I used to sketch it out:

#include #include #include

void t() { static std::once_flag flag; static std::mutex show_mutex; #if 1 || TEST static void *p { nullptr }; std::call_once(flag, { p = malloc(1024); }); std::cout << p << std::endl;; #else std::call_once(flag, { show_mutex = xSemaphoreCreateMutex(); }); #endif }

int main() { t(); t(); t(); return 0; }

Here, when building -Dtest, you can see that p is calling malloc each time we're called (leaking) and we should see P dropping 1K each time if we just called malloc each time, but you'll see we call malloc once and the returned (OK, I didn't actually return it; I just printed 'p') actually stays the same upon every call:

➜ /tmp c++ -std=c++20 x.cc -o x && ./x 0x121808800 0x121808800 0x121808800

So, we could have built a real function and passed the address of the function to call_once. That function would then do the work and return - once. But I chose instead to do the exact same thing in a lambda because here, it's just a bit more tidy and keeps our internals internal without.

Where else can you use lambda? Anywhere you'd use a function pointer. Perhaps obviously, they're handy for things like GUI callbacks, but qsort lends iteslf well to lambda use as they're "just" unnamed functions. The syntax is a little egg-headed, but knowing where to find good examples helps. YOu can capture by rference [&] (which aligns to pointer syntax) or bay value [=] which is fairly mnemonic. Varargs are possible, but let's leave those alone. IMO, they make code easier to read when implemented well, but the people that think that 1972 C was good enough for everyone will always scream about them.

So today we learned about a Singleton - a function that always returns one, correctly initialized object. It's like a global and some people fuss about that design pattern. It's difficult to define destruction order, for example and they share lots of problems with globals like global mutable state and difficulty to mock for tests. But here, you're always dealing with one THING and dealing with it in a small, controlled way, so it seems OK. One reasonable description is given at https://refactoring.guru/design-patterns/singleton

Lambda - Lambdas are just functions with no names (!) that can be used in line and have a capture list (the thing between []) and variables that are passed by reference & or by value = and a paramater list included in the (), just like a function does. Tutorials like https://smartbear.com/blog/c11-tutorial-lambda-expressions-the-nuts-and-bolts/ do a good job without just repeating the cppreference page, which aren't always for time before coffee.

It's a lotta typing for what I think was like a three line "fix" to a problem we've agreed wasn't a totally terrible problem.

My OS used to be used for the controls on submarines, for example, and my final employer used to have s (true) saying that around (t)here, things that happen "once in a million times" happen multiple times every day. So I like to tidy things up as tight as I can.

As always, feel free to use that code in your PR (please) and I hope you (and anyone else reading this) learned something along the way. Maybe it'll help the reviewer recognize what we did and why.

Enjoy

On Wed, Jun 12, 2024 at 1:23 PM teknynja @.***> wrote:

Yeah, i wasn't sure how to avoid that without having a way to initialize with the object...

— Reply to this email directly, view it on GitHub https://github.com/adafruit/Adafruit_NeoPixel/issues/375#issuecomment-2163654024, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCSD3ZBGHRI7JH6EDH2XKDZHCGZDAVCNFSM6AAAAABAYU77RSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRTGY2TIMBSGQ . You are receiving this because you were mentioned.Message ID: @.***>

robertlipe avatar Jun 13 '24 07:06 robertlipe

hmm, it doesn't appear we have access to the std library in this project (perhaps because this is just a plain ol .c file?). At least when I try to build in the Arduino environment, it complains that it can't find the header files. Also not sure how the maintainers would feel about adding references to std:: since it doesn't appear they are using it anywhere else.

Your example did get me looking into possible solutions from within the FreeRTOS framework itself, and I discovered the xSemaphoreCreateMutexStatic function which looked very promising, but digging through the FreeRTOS source code, it looks like that call is probably not be thread-safe either. :-/

I'm wondering if I could just do something quick-n-dirty, like disabling interrupts around where I'm checking/creating the mutex... it should be pretty quick through that section, and is in keeping with the design philosophy of this library. (Edit: Ugh, disabling interrupts would only solve it on single core apps!)

I nags at me too, having even a tiny crack like this race condition in the code - my day job is working on industrial process control systems and we'd never let something like that into production.

Interestingly, I just started looking into c++ lambdas a short time ago to try and solve a different problem (I'm a big ~abuser~ user of lambdas in C# though!)

teknynja avatar Jun 13 '24 17:06 teknynja

Good grief. It's c++ (I see an operator new) but it doesn't allow C++. The main neopixel.cpp has a cpp extension and is definitely c++...but is about half asm(). And Adafruit_NeoPixel::show( could be subclassed for the OS specific parts instead of #iddeffing it in. Freakin' arduino! :-|

Can you just move the creation up somewhere before all the tasks are created, ala https://www.digikey.com/en/maker/projects/introduction-to-rtos-solution-to-part-7-freertos-semaphore-example/51aa8660524c4daba38cba7c2f5baba7 and https://embetronicx.com/tutorials/rtos/freertos/freertos-mutex-tutorial-using-lpc2148/ and https://community.nxp.com/t5/MQX-Software-Solutions-Knowledge/How-to-use-mutex-and-semaphores-in-a-FreeRTOS-and-SDK2-0-Project/ta-p/1114196

All those examples just hoist it before all those icky threads are distracting it.

There's prior art for #ifdefs inside Adafruit_NeoPixel::Adafruit_NeoPixel(). Then again, rp2040Show() is doing a very similar sort of thing. Now you have a test inside what's in what's surely the most critical path testing false for all but the very first strand of pixels displayed.

Looking at bigger picture, I think I'd lean toward an #ifdef (sigh) inside Adafruit_NeoPixel::Adafruit_NeoPixel() so that you do this only once and don't have to take a branch at 120fps or whatever. No 'run_once' magic needed.

I'm quickly learning that your "I do very little C++" was a bit understated. :-)

On Thu, Jun 13, 2024 at 12:04 PM teknynja @.***> wrote:

hmm, it doesn't appear we have access to the std library in this project (perhaps because this is just a plain ol .c file?). At least when I try to build in the Arduino environment, it complains that it can't find the header files. Also not sure how the maintainers would feel about adding references to std:: since it doesn't appear they are using it anywhere else.

Your example did get me looking into possible solutions from within the FreeRTOS framework itself, and I discovered the xSemaphoreCreateMutexStatic https://www.freertos.org/xSemaphoreCreateMutexStatic.html function which looked very promising, but digging through the FreeRTOS source code, it looks like that call is probably not be thread-safe either. :-/

I'm wondering if I could just do something quick-n-dirty, like disabling interrupts around where I'm checking/creating the mutex... it should be pretty quick through that section, and is in keeping with the design philosophy of this library.

I nags at me too, having even a tiny crack like this race condition in the code - my day job is working on industrial process control systems and we'd never let something like that into production.

Interestingly, I just started looking into c++ lambdas a short time ago to try and solve a different problem (I'm a big abuser user of lambdas in C# though!)

— Reply to this email directly, view it on GitHub https://github.com/adafruit/Adafruit_NeoPixel/issues/375#issuecomment-2166304399, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCSD377BPKFRAVJN7HEVCDZHHGLJAVCNFSM6AAAAABAYU77RSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRWGMYDIMZZHE . You are receiving this because you were mentioned.Message ID: @.***>

robertlipe avatar Jun 13 '24 19:06 robertlipe

I was trying to avoid modifying the main .cpp file, but at this point it looks unavoidable. I could put the mutex in the initializer list (Edit: No, I can't!), but that feels pretty invasive (not to mention how ugly the #ifdefs around the ctor would be) and it still doesn't completely close the race condition (just moves it to the ctors racing). Even if we don't do a "test and set", there's still an opportunity for a thread to stomp on another thread's mutex.

In a perfect world, the mutex would be created in the startup thread before any child threads are spawned, but that's not a reasonable thing to expect a typical Arduino user to deal with. The best we could probably hope for is to require the Adafruit_NeoPixel instances to be created on the main thread, and then calls to .begin(), .show(), et al could be done on the worker threads. Without low (cpu) level synchronization primitives, it think it's gonna be tough to create a mutex on a thread that needs to be protected with that very same mutex.

teknynja avatar Jun 13 '24 19:06 teknynja

I think that's exactly the needed solution.

Global Constructors are called before (the equivalent of) main() - so inside a ctor, we know we're single threaded. Having the NeoPixel ctor handle this is perfect. Stylistically, I'd probably do it in the body instead of in the initializer list, but I could go either way.

This is why having it in a singleton (that's created as a global ctor) and used by Adafruit_NeoPixel seems to work. That removes the synchronization requirement from the caller., right?

On Thu, Jun 13, 2024 at 2:46 PM teknynja @.***> wrote:

I was trying to avoid modifying the main .cpp file, but at this point it looks unavoidable. I could put the mutex in the initializer list, but that feels pretty invasive (not to mention how ugly the #ifdefs around the ctor would be) and it still doesn't completely close the race condition (just moves it to the ctors racing). Even if we don't do a "test and set", there's still an opportunity for a thread to stomp on another thread's mutex.

In a perfect world, the mutex would be created in the startup thread before any child threads are spawned, but that's not a reasonable thing to expect a typical Arduino user to deal with. The best we could probably hope for is to require the Adafruit_NeoPixel instances to be created on the main thread, and then calls to .begin(), .show(), et al could be done on the worker threads. Without low (cpu) level synchronization primitives, it think it's gonna be tough to create a mutex on a thread that needs to be protected with that very same mutex.

— Reply to this email directly, view it on GitHub https://github.com/adafruit/Adafruit_NeoPixel/issues/375#issuecomment-2166644439, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCSD32SAUQKMLO6JUN3FBLZHHZJZAVCNFSM6AAAAABAYU77RSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRWGY2DINBTHE . You are receiving this because you were mentioned.Message ID: @.***>

robertlipe avatar Jun 14 '24 04:06 robertlipe

Updated pull request #394 to move the initialization of the espShow() mutext to the Adafruit_NeoPixel constructor. As long as all instances are created on the main (or same) thread, the race condition is avoided. Otherwise, there is a slight chance of the threads clobbering each other's mutex (the same behavior we had before this change).

This is a slightly more "invasive" change then before, as there are minor changes now to the Adafruit_NeoPixel.cpp & .h files :shrug:.

teknynja avatar Jun 15 '24 19:06 teknynja

I am just a kibitzer myself (who ran into this issue), but here's my understanding of the situation, for reference.

Background:

  • The ESP32 Arduino core "arduino-esp32" is built on top of "ESP-IDF", an ESP32-specific support framework made by Espressif (makers of the ESP32). Any particular version of arduino-esp32 includes some version of ESP-IDF, and translates Arduino-type calls to ESP-IDF calls. (For example, when you call Arduino digitalWrite, arduino-esp32 calls ESP-IDF gpio_set_level.)
  • Arduino code (sketches and libraries) can, and do, use the available version of ESP-IDF directly, since arduino-esp32 doesn't translate everything ESP-IDF does into Arduino-ish. You can "mix and match" Arduino-ish calls and ESP-IDF calls. Of course, if you use ESP-IDF, your code is necessarily ESP32 specific.
  • On the ESP32, the Adafruit NeoPixel library, like other LED-strip-driving libraries, uses the ESP32's "RMT" hardware module. This module is designed for driving infrared remote controls (like, controlling a TV) but is kind of a general timed pulse train synthesis module and can be pressed into service to drive NeoPixel-type LED strips with hardware, with guaranteed timing and without tying up the CPU "bit banging" a GPIO. Cool beans.
  • The arduino-esp32 RMT driver doesn't offer quite enough control to efficiently drive LED strips (more on this later), so the Adafruit NeoPixel library uses the ESP-IDF RMT driver directly. RMT is ESP32-specific anyway so this is fine. (On other chips, the NeoPixel library uses entirely different techniques.)

So far this is all fine. But we have some wrinkles:

  • The previous arduino-esp32 v2.0.14 was based on ESP-iDF v4.4.7, but the new arduino-esp32 v3.0 uses ESP-IDF v5.1.4.
  • The ESP-IDF RMT driver interface was completely rewritten between v4.x to v5.x. That potentially completely breaks the Adafruit code that uses the old ESP-IDF RMT driver interface.
  • Good news: The "legacy" RMT driver is still available for compatibility, as of ESP-IDF 5.1! (It gets removed in later ESP-IDF versions, but arduino-esp32 isn't there yet.)
  • Bad news: ESP-IDF will deliberately crash on startup if both the legacy RMT library and newer RMT library are included in the firmware (even if both are not necessarily used).
  • Worse news: In many cases, arduino-esp32 itself uses the newer RMT library (see https://github.com/espressif/arduino-esp32/issues/9866 for a deep dive).

To deal with this RMT driver interface change in ESP-IDF v5.x, Adafruit (Lady Ada herself!) changed the NeoPixel library. On ESP-IDF v5.x, instead of using the ESP-IDF RMT driver, it uses the arduino-esp32 wrapper driver, which in arduino-esp32 v3.x uses the new ESP-IDF RMT driver.

BUT, there was a reason to bypass arduino-esp32 originally. (Remember I said I'd get to this.) The ESP32 RMT hardware can be programmed in various ways. One is just to give it a whole sequence of exact bit timing in a big buffer. (Like, "set the line HIGH for 350 microseconds, then LOW for 75 microseconds, then...".) This is the only mode the arduino-esp32 wrapper supports. But there's another mode which ESP-IDF supports, where you store the buffer in your own favorite format, and supply a little translation function that turns a chunk of buffer into a chunk of timed bits, and ESP-IDF calls that function from an interrupt as needed to translate a chunk at a time, which is much more memory efficient. This "streaming" mode is what the NeoPixel library uses on ESP-IDF v4.x. On v5.x, with the Arduino wrapper, it just allocates a (potentially) big buffer -- 32 bytes of buffer for every byte of pixel data! And it does so on the stack.

(Note the ironic-in-hindsight exchange in https://github.com/adafruit/Adafruit_NeoPixel/pull/367 where @ladyada wonders if it's OK to allocate the buffer on the stack, https://github.com/me-no-dev notes that it could be pretty big and should maybe be allocated on the heap, but @ladyada said "the stack memory is fine", probably thinking about post-return references and not about stack exhaustion...)

  • But, good news! The arduino-esp32 maintainers added a way to not use the newer RMT library, allowing the legacy RMT library to be used. This got included in arduino-esp32 v3.0.3.
  • Note, the way you do that disabling is kinda hinky, you have to create a "header file" with the magical name build_opt.h, which isn't a header file at all but a list of compiler flags, and add -DESP32_ARDUINO_NO_RGB_BUILTIN to that.
  • Then you need to hack the Adafruit NeoPixel library to use the old RMT driver even though it's on ESP-IDF v5.x.

Going forward, probably the NeoPixel library should have code that uses the new ESP-IDF RMT driver, which does still support streaming mode (but with a whole different interface). Then everything will be dunky hory, though that will be a bunch of work for someone to do.

Meanwhile, allocating that big buffer on the heap is probably a good move. It's not a cure-all, though -- depending on the length of your pixel chain, you could end up running out of heap memory! (1000 pixels * 3 bytes/pixel * 32x blowup = 96KB, which is a lot of memory for an ESP32.)

egnor avatar Aug 05 '24 05:08 egnor

Nice writeup. A couple of us are kibitzing on this issue. Unfortunately, we can't seem to attract the attention of anyone capable/interested in offering feedback and/or pressing the 'merge' button.

Your understanding of the current, very messy, state of affairs matches my own. I'm increasingly tempted to rip out the actual hardware-groping parts of several LED libraries and just rewiring them to use Espressif's own https://github.com/espressif/idf-extra-components/tree/master/led_strip at the bottom end and just using the existing stuff for HSV-RGB transitions, blending, fading, line-drawing, and all that. Maybe Espressif can keep their own code stable and working on a variety of their own chips and IDF version...and we have less Arduino-quality code in the world.

Since you have a great "state of the union, mid 2024" edition going, I'll add one other bit of "worse news" to the list that will affect a substantial percentage of hobbyist/users in this market. Let's make it easy for the LLM harvesters (or any humans that actually read this stuff) to ingest it all on a single page.

Platformio uses backrevved versions of, well, most everything for ESP32. That C++23 feature you're trying to use? Good luck with that six year old version of g++ they bundle. The reason for this is that Espressif and Platformio are having a very public love withdrawal.

Platformio wants Espressif to pay them. Espressif keeps sending patches and trying to keep them running, but Platformio won't even accept the PRs. Espressif doesn't really see a need to pay Platformio to smear a layer of goo atop of their own IDF. This means if you're trying to run a new Espressif part with Platformio on the Arduino platform, you're probably hosed. The web is full of much semi-public feudin' between them, but https://www.cnx-software.com/2024/06/01/espressif-releases-arduino-esp32-core-3-0-0-but-platformio-support-in-doubt/ is a good start if you want to pull at that thread. This spat has been going on for at least a year at this point and I haven't yet seen either side blinking.

The result is that a large percentage of the popular blinky code (because it's all dependent on RMT or the more stable SPI) just plain doesn't build right now if you check out the head versions of everything and are using Platformio. It's a mess. The fastled issue tracker (which was already struggling to keep up) is buried in duplicates of "it doesn't work" posts and sometiems "... but don't bother yourself actually fixing it, just downgrade instead" answers.

Sidebar: Maybe there's a message here that spreading your development over what you thought was an IDE and what you thought was a convenience wrapper for IDF that happens to look like it was written for an ATMega (ugh) can substantially increase the risk to your project's velocity because parties that you're not directly paying are mad because nobody else is paying them either. Right now, Jason2866 is almost single-handedly holding up the support load and the build/integration issues to publish a mostly viable Arduino3/IDF5 hybrid and that seems like totally the wrong resolution, but it's a nice testament that open source makes that even possible.

Sidebar to the sidebar: I'm not sure that all these "let's allow you to mix and match the incompatible drivers" flying around are great for the ecosystem either. If they didn't build versioned compatibility into the API (and sometimes, you just can't...) the effort would be better spent just fixing the callers, IMO. But Espressif doing goofy things like gratuitously renaming DMA registers on RMT between ESP32-S3 and ESP32-Nothing in the same build isn't a great confidence builder, either.

Exactly none of this impacts the probable correctness of this PR which deserves to be reviewed and potentially merged. Teknynja's fix is unrelated to this squabble, tackles the very issue that LadaAda and MeNoDev discussed, and is independently confirmed to work. It should be merged. Withholding stability fixes is not helping the state of the blinkyverse right now. 73px on what's currently the flagship Espressif part (P4 any day now...but without WiFi /shruggie) is a pretty low bar.

As for it not being a cure-all, that's certainly true; but we know one configuration where you're never going to get 96K and that's on the stack in FreeRTOS. :-)

Blink on!

robertlipe avatar Aug 05 '24 07:08 robertlipe

@egnor – I think that captures (in great detail) the situation surrounding this issue, thanks! And @robertlipe, I had no idea about the drama/issues going on with Platformio (ignorance is bliss!).

I agree that the amount of buffer space per NeoPixel is problematic, the primary motivation for my work on this issue was to be able to break the unexpectedly low limit of ~73 pixels when I was using the new ESP-IDF RMT driver for other purposes in my project as well. Initially, I attempted to use the low-level solution of translating the pixel buffer data "on-the-fly", but getting the timing right to where that translation code was called with consistently low enough latency to keep the RMT’s buffers fed proved challenging. If I remember, it also caused problems with the 5.x RMT library’s buffer allocation, which otherwise seems to work very well as long as you stick to the 5.x API.

I think until a better solution comes along, the 1000ish pixel limit will get most users of this library where they want to go, and if it doesn’t then they will need to select a different platform to achieve their goals (just like you would need to step up from the ATmega328 if you wanted to drive more than a few hundred pixels). I might be a bit biased about getting my changes merged into this library, but I think given the state of things at the moment, this change allows a large percentage of users to get on with their development and worry about other problems.

Thank you for prodding this issue, I was beginning to wonder if it was destined to end up as a footnote on archive.org 😉

teknynja avatar Aug 05 '24 13:08 teknynja

im somewhat sleep deprived - is there a fix anyone can PR that will address this or do y'all need me to research it? the IDF5 fix was a patch to keep things going, wasn't rigorously tested.

ladyada avatar Aug 05 '24 13:08 ladyada

@ladyada as a quick-ish fix take a look at the two PRs (pick one or the other, not both) by @teknynja

  • https://github.com/adafruit/Adafruit_NeoPixel/pull/392
  • https://github.com/adafruit/Adafruit_NeoPixel/pull/394

((My guesswork: Probably you'd like to keep the behavior of initializing and deinitializing RMT entirely within show() rather than the global initialization in that PR, but that's a quick change if desired?? In general the quickest of fixes would just be to malloc() the buffer on entry to the function and free() it on exit, that would be the most surgical fix though not the most performant?))

If you'd like someone to take a look at "properly" porting to the new RMT API, I/we could take a crack at that...? I think it's mostly mechanical, albeit detailed. (There's also @robertlipe's proposal to use Espressif's example code.)

p s. get some sleep @ladyada, I can't believe you run the company AND dive into the weeds of RMT APIs on the ESP32 lol! surely you have an army of Adaminions?

egnor avatar Aug 05 '24 14:08 egnor

@ladyada - RMT Buffer Allocation Fix (Thread-Safe) for Issue #375 #394 is the PR I recommend as it includes tweaks to improve the thread-safety of the fix. The other PR tries to keep all the changes within the show() method at the expense of some thread safety.

And yes @ladyada, get some rest!

teknynja avatar Aug 05 '24 14:08 teknynja

+1. I may have no credibility here yet, but I have some. I reviewed and coached this PR (the coaching was in the other, bit the result is here) and would approve it into my own project. It should at least be reviewed, please.

We may wish the RMT DMACs worked differently or that the espressif driver API would get their act together or that code were massively structured differently or such, but this is a solid fix with low risc that doesn't perturb other platforms and is independently confirmed to work.

Oh, and since psram is so common on S3 -and this doesn't actually use more ram, it just uses it more responsibly - I didn't much blink at the ram footprint. Twiddling buffers on the fly is the opposite of handing a block to a DMAC and then running radios, compressing things, reading other sensors and doing other Real Work. Reliable blinkage counts for a lot.

Please do considerer merging this PR or something very close to it. The author is capable and willing to help fix a pretty bad problem.

I'm just awakening myself at 1530. Sleep disorders stink. Take care of yourself, LadyAda.

On Mon, Aug 5, 2024, 9:20 AM teknynja @.***> wrote:

@ladyada https://github.com/ladyada - RMT Buffer Allocation Fix (Thread-Safe) for Issue #375 #394 https://github.com/adafruit/Adafruit_NeoPixel/pull/394 is the PR I recommend as includes tweaks to improve the thread-safety of the fix. The other PR tries to keep all the changes within the show() method at the expense of some thread safety.

And yes @ladyada https://github.com/ladyada, get some rest!

— Reply to this email directly, view it on GitHub https://github.com/adafruit/Adafruit_NeoPixel/issues/375#issuecomment-2269197462, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCSD37PLHNMJHZQSUYUIZLZP6C2NAVCNFSM6AAAAABAYU77RSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRZGE4TONBWGI . You are receiving this because you were mentioned.Message ID: @.***>

robertlipe avatar Aug 05 '24 20:08 robertlipe

For future generations reading this on archivedotorg... Please remember that 281xs are write-only and there's no ack. You don't NEED a thousand LEDs, power supplies and all that to TEST multiple channels of a thousand LEDs. Build code and test at the edges. Just let the electrons fall into the floor. No recycling necessary. 😉

On Mon, Aug 5, 2024, 3:50 PM Robert Lipe @.***> wrote:

+1. I may have no credibility here yet, but I have some. I reviewed and coached this PR (the coaching was in the other, bit the result is here) and would approve it into my own project. It should at least be reviewed, please.

We may wish the RMT DMACs worked differently or that the espressif driver API would get their act together or that code were massively structured differently or such, but this is a solid fix with low risc that doesn't perturb other platforms and is independently confirmed to work.

Oh, and since psram is so common on S3 -and this doesn't actually use more ram, it just uses it more responsibly - I didn't much blink at the ram footprint. Twiddling buffers on the fly is the opposite of handing a block to a DMAC and then running radios, compressing things, reading other sensors and doing other Real Work. Reliable blinkage counts for a lot.

Please do considerer merging this PR or something very close to it. The author is capable and willing to help fix a pretty bad problem.

I'm just awakening myself at 1530. Sleep disorders stink. Take care of yourself, LadyAda.

On Mon, Aug 5, 2024, 9:20 AM teknynja @.***> wrote:

@ladyada https://github.com/ladyada - RMT Buffer Allocation Fix (Thread-Safe) for Issue #375 #394 https://github.com/adafruit/Adafruit_NeoPixel/pull/394 is the PR I recommend as includes tweaks to improve the thread-safety of the fix. The other PR tries to keep all the changes within the show() method at the expense of some thread safety.

And yes @ladyada https://github.com/ladyada, get some rest!

— Reply to this email directly, view it on GitHub https://github.com/adafruit/Adafruit_NeoPixel/issues/375#issuecomment-2269197462, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCSD37PLHNMJHZQSUYUIZLZP6C2NAVCNFSM6AAAAABAYU77RSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRZGE4TONBWGI . You are receiving this because you were mentioned.Message ID: @.***>

robertlipe avatar Aug 05 '24 20:08 robertlipe

(Knowing what we now know, I believe the PR can be simplified a bit, and I left comments to that effect, which you should feel free to adopt or ignore. As far as I can tell it should work as-is, but with simplification would be easier for @ladyada or her Adaminions to review.)

Absolutely agreed with @robertlipe that (1) we should take this PR or something like it as a quick fix, (2) one can test without necessarily having million mile LED strips. (I do have very long LED strips around, for what that's worth. I don't have every ESP32 variant under the Sun, but I have several. Regardless, if anyone is set up to test NeoPixel output, it would be Mama Adafruit.)

egnor avatar Aug 05 '24 21:08 egnor

The "bitbucket" test is a valid way to make sure the code compiles and runs w/o crashing, but in order to make sure there aren't any visual glitches or other timing artifacts, we'd at least need to connect a reasonable number of actual devices and watch the result (or pour over a logic analyzer's output ;-) )

teknynja avatar Aug 05 '24 22:08 teknynja

Building and not crashing would put it in the top quartile of blinky projects. (I mean, we're having this discussion because it did...) The ESP32-ish blinky ecosystem right now is a mess.

I'd given some thought to trying to build a WS2812 emulator or receiver that did exactly that, just for testing. I won't compete with Ms. Ada for the number of strips at my disposal as I don't have a literal warehouse, but it's a lot ... and it's a pain. Then I realized that doing that in software is silly.

One of us (in the blink-osphere) should take the real chip (Adafruit product 1378 :-) and instead of attaching RGB pins to actual LEDs, route those as inputs to a reasonably powerful microcontroller. Maybe you also need DOUT. We could do clock recovery (/waves hands broadly) to know when to sample/latch the signals into a conveniently large bag of pixel bytes/. Feed those buffers back to a Real Computer over WiFi or USB after some kind of trivial compression. Do some kind of reset detection to know where a frame ends. I think one could detect the common, timing induced, visual glitches by detecting the equivalent of "runt frames" which have fewer than the expected number of transitions because you'd see > 50uS (a reset code) present on the bus.

As a blinkie developer, I'm one of dozens of people on the planet that would pay hobbyist money (not $10K logic analyzer money) for this just so I don't HAVE to do that inevitable "stare at the strip" exercise. Honestly, four of the 16x16 panels is a fast way to get to a thousand pixels and, at moderate intensity, they're not THAT hard to power, but I'd love to have something on the bench I could use for automated regression testing or even effect development. After analysis to know what percentage of frames are malformed (is the glitch level acceptable?) feed those buffers into ledcat or dozens of other projects that can render an RGB array into a browser, a GIF file, window, other display, or whatever.

I got hung up on trying to emulate the 2812 when the epiphany is that we should just capture the output of the pins of the 2811.

I'd be willing to partner in the development of such a project. Just sayin'

RJL

On Mon, Aug 5, 2024 at 5:58 PM teknynja @.***> wrote:

The "bitbucket" test is a valid way to make sure the code compiles and runs w/o crashing, but in order to make sure there aren't any visual glitches or other timing artifacts, we'd at least need to connect a reasonable number of actual devices and watch the result (or pour over a logic analyzer's output ;-) )

— Reply to this email directly, view it on GitHub https://github.com/adafruit/Adafruit_NeoPixel/issues/375#issuecomment-2270058141, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCSD3YI3EVDGJ3PVIUCAOLZP77QPAVCNFSM6AAAAABAYU77RSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZQGA2TQMJUGE . You are receiving this because you were mentioned.Message ID: @.***>

robertlipe avatar Aug 06 '24 00:08 robertlipe

QEMU based esp32 simulator:

https://github.com/mluis/qemu-esp32

@robertlipe

zackees avatar Aug 06 '24 00:08 zackees

Qemu is awesome, but not the right end of the wire for this, Zach. Emulating CPU is a road well traveled. I want something to "emulate" (capture) the LIGHTS. You make FastLED send a pattern and my fanasth board attaches to the data out pin and captures that frame and sends it to a real computer for analysis (did it match what we sent? How many fps? What percentage is mangled), capture (give me a gif for documentation), or display.

Cheap enough that you can leave one or more ports on it attached to all the CPUs you care about in your test farm.

Think LED receiver, like a video capture card that gulps DVI and writes PNG or mov or whatever.

But, yes, you and your peers are FastLED would be good candidates for my "dozens" of customers.

On Mon, Aug 5, 2024, 7:36 PM Zachary Vorhies @.***> wrote:

QEMU based esp32 simulator:

https://github.com/mluis/qemu-esp32

@robertlipe https://github.com/robertlipe

— Reply to this email directly, view it on GitHub https://github.com/adafruit/Adafruit_NeoPixel/issues/375#issuecomment-2270151758, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCSD34KZB7M2F4KRAPOSVDZQALCPAVCNFSM6AAAAABAYU77RSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZQGE2TCNZVHA . You are receiving this because you were mentioned.Message ID: @.***>

robertlipe avatar Aug 06 '24 00:08 robertlipe