WLED icon indicating copy to clipboard operation
WLED copied to clipboard

Improvements & fixes for HUB75

Open DedeHai opened this issue 4 months ago • 34 comments

  • added proper config parameters to allow multiple panels
  • added config checks
  • fixed crashes on S3
  • changed constant variables to constexpr
  • added "wled.h" to bus_manager.cpp and removed local function prototypes (needed for buffer allocations)
  • speed optimisations: yields about 10% higher FPS
  • proper brightness handling
  • updated platformio_override.sample.ini
  • some code cleanup

I do not own any quarter-scan HUB75 panels nor a suitable ESP32, so please test if this is working, I tested this on 2x half-scan 64x32 panels using an S3 only.

Summary by CodeRabbit

  • Bug Fixes

    • Improved HUB75 matrix display handling and pixel color accuracy.
    • Enhanced memory management for large display configurations.
  • New Features

    • Added validation for HUB75 panel configuration to prevent invalid setups.
    • Expanded HUB75 configuration UI with improved row/column tracking.
  • Documentation

    • Added brightness guidance notes for HUB75 displays to prevent ghosting artifacts.

DedeHai avatar Oct 26 '25 14:10 DedeHai

Walkthrough

Refactors HUB75 handling: adds virtual display support and memory-allocation wrappers, optimizes bitwise helpers, changes Hub75 pin reporting from 3→5, updates setPixelColor/getPins signatures/behavior, tightens HUB75 UI validation, and updates platformio documentation comments. No public data-type removals.

Changes

Cohort / File(s) Summary
PlatformIO config
platformio_override.sample.ini
Documentation/comment updates: added note about Hub75 full-brightness ghosting and suggested max-brightness workaround; expanded inline LO LIN_WIFI_FIX comments; no build-flag behavior changes.
Bus manager implementation
wled00/bus_manager.cpp
Large refactor: removed unused headers, replaced division/modulo with bitwise ops, introduced d_malloc/d_free wrappers, added virtualDisp and panel geometry aliases, clamped chain lengths, added QS/virtual-path init, routed setPixelColor/getPixelColor through virtual vs physical paths, added IRAM_ATTR to setPixelColor, adjusted show()/brightness/dirty-bit handling, and enhanced cleanup.
Bus manager header / pin count
wled00/bus_manager.h
Hub75 branch of getNumberOfPins(uint8_t) now returns 5 pins instead of 3 (behavior change in pin reporting).
Color util
wled00/colors.cpp
Minor change in color_fade: use named BLACK constant instead of literal 0 for comparisons; behavior preserved.
Settings UI (HUB75)
wled00/data/settings_leds.htm
Added client-side HUB75 validation (panel width×height, height limits), updated labels to “Panel (width x height):” / “No. of Panels:”, increased HUB75-specific pin contribution from 2→4, prevented multiple HUB75 buses in UI logic, defaulted missing values, and added S3 reboot notice on HUB75 config change.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

  • Focus review on wled00/bus_manager.cpp (virtual display init, memory allocation/cleanup, IRAM_ATTR usage).
  • Verify callers of getNumberOfPins / getPins handle expanded pin count and updated return semantics.
  • Check allocation/deallocation pairings (d_malloc/d_free) and ESP target-specific guards.
  • Audit bitwise optimizations for correctness across ranges and sign/unsigned usage.
  • Validate UI validation logic in settings_leds.htm matches runtime constraints.

Possibly related PRs

  • wled/WLED#3777 — Changes to the same HUB75 support paths and bus manager implementation.
  • wled/WLED#4950 — Hub75 pin-count and bus_manager display handling overlap with this PR.
  • wled/WLED#4895 — Modifies bus manager APIs/fields and Hub75-related logic similar to these edits.

Suggested reviewers

  • softhack007
  • blazoncek
  • netmindz

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Improvements & fixes for HUB75' directly relates to the changeset, which includes HUB75 configuration parameters, crash fixes, performance optimizations, brightness handling improvements, and code cleanup across multiple HUB75-related files.
✨ Finishing touches
  • [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Oct 26 '25 14:10 coderabbitai[bot]

@DedeHai and @netmindz one thing you might want to try - currently chained hub75 panels are not compatible with the wled 2D layout settings. The reason is that HUB75 is scan line oriented, and adding a second panel means that the physical width (scanline) is increased, for example from width=64 (1panel) to width=128 (2 panels). This is not foreseen in the 2D setup of wled (at least it was not foreseen in the code from 6 months ago 😉 ).

The solution might be to tell the driver to use "vertical" panel layout, which means that chained panels are stacked in one column, instead of putting them into a row next to each other.

This "vertical layout" should make multi-panels (chained panels) compatible with what WLED expects: first pixel = top left, no serpentine, each panel having a separate (not interleaved) pixels range.

softhack007 avatar Nov 02 '25 13:11 softhack007

Had a quick test with a basics single panel setup. UI improvements look good. Need to test multiple panels setups. Need to double check but I think the brightness looked lower, I need to revert to compare

netmindz avatar Nov 07 '25 18:11 netmindz

I think the brightness looked lower, I need to revert to compare

brightness should not change, its still set to 255 at driver level. New gamma correction handling may play a role.

DedeHai avatar Nov 08 '25 08:11 DedeHai

@netmindz @DedeHai & @softhack007 I've backported @netmindz's Hub75 to my fork (0.15) where I made some similar changes as @DedeHai made (no coordination) here. 😄 However, I've also enhanced UI to be aware of GPIO used by Hub75 driver and allowed for dropdown use in LED GPIO. I do not have any hardware to test so my coding was pure guesswork and logic analysis of flow.

One thing I'd comment here is the removal of HW assisted brightness scaling in this PR. It is a waste of CPU cycles to do any scaling on bus level IMO. For NPB it makes sense as that was purely SW.

The other thing is getPixelColor(). Bus level gPC() is no longer needed since the introduction of my segment blending. Any and all getPixelColor() happen on segment buffers and is not affected by final or segment brightness/opacity. Realtime functions probe WS2812FX buffer instead of bus driver. This means that drivers no longer need to maintain pixel data and any "double" buffering can be removed as it is done in WS2812FX class.

blazoncek avatar Nov 12 '25 15:11 blazoncek

One thing I'd comment here is the removal of HW assisted brightness scaling in this PR. It is a waste of CPU cycles to do any scaling on bus level IMO. For NPB it makes sense as that was purely SW.

I did that to be consistent with NPB handling, as I just learned that HUB75 actually support HW-brightness. I need to check that but if it is true hardware scaling, we should definitely NOT do software scaling. so thanks for pointing this out.

The other thing is getPixelColor(). Bus level gPC() is no longer needed since the introduction of my segment blending.

IIRC there may be some usermods that need bus-level gPC(), at least thats what I remember from the top of my head, because I was close to removing them when I updated the ABL & brightness scaling code. Not sure this is actually true or just coming from some outdated, lingering inline comment.

DedeHai avatar Nov 12 '25 16:11 DedeHai

IIRC there may be some usermods that need bus-level gPC(),

None should do that IMO. I would remove bus level gPC/sPC from public API.

There are benefits of having bus level ABL and strip (AKA WS2812FX) level ABL. I would keep both.

blazoncek avatar Nov 12 '25 19:11 blazoncek

. I need to check that but if it is true hardware scaling,

i think it is "kind of" hardware scaling. the hub75 driver will PWM-modulate the "enable" signal, instead of scaling individual pixels. But this feature depends on having the correct "shift register" driver in HUB75 mxconfig.

softhack007 avatar Nov 13 '25 16:11 softhack007

This means that drivers no longer need to maintain pixel data and any "double" buffering can be removed as it is done in WS2812FX class.

Hi @blazoncek, hope you are doing well :-)

About your comments:

  • It might be that in general, busses.getPixelColor() could be removed if its no longer needed. I'm not going to interfere with any design decision on this, it has to be decided by the team. However I'd like to keep the scope of PRs clean, meaning that "remove gPC from bus level" should be a separate PR.
  • "drivers no longer need to maintain pixel data and any "double" buffering can be removed" -> this is partially true. One reason that HUB75 has its own double buffer was the need to support getPixelColor on bus level.
    • There is also a second purpose - buffering all pixel writes until "busses.show()" is necessary to avoid flickering. HUB75 works similar to a TV screen, and the content is continuously refreshed. So if we forward each "setPixelcolor" directly to the HUB75 hardware driver, it results in flickering and tearing. The HUB75 driver offers its own double-buffer solution for this, but it needs a lot more RAM for this buffer - so having a CRGB "FastLED" style buffer is actually removing flickering (all pixels pushed at once), and it reduces the RAM needed by a large amount (like 24Kb saved for one 64x64 panel).
    • An idea for upstream WLED - which I haven't tried yet - is to directly give the busHUB75 driver access to the main "frame buffer" at strip level. It means that hub75.show() can directly push all pixels from the upper level framebuffer (and yes, the "pixel pusher" must be at busHUB75 level !), without needing a buffer of its own. In this scenario, hub75.sPC() will only manage the "dirty bits" for optimization, but we could avoid the full double buffering by accessing the already existing one on upper level.
  • Finally - you should try a HUB75 panel, these things are really fun 😀 . I would suggest to start with a 64x64 "indoors" P2 or P3 module - make sure to look at brightness (nits) and contrast range (1:800 or more). These panels are quite affordable (~25€ on ali). Then get the "Moonhub" adapter and the LiliGo "T7-S3" with PSRAM, connect and have fun. I'm sure that @lost-hope would be happy to help you select a panel and get started :-)

Cheers, and take care, Frank.

Edit: just for the fun of it - run the below .gif in image effect on HUB75, once you have the hardware ghostbusters64

softhack007 avatar Nov 13 '25 22:11 softhack007

  • There is also a second purpose - buffering all pixel writes until "busses.show()" is necessary to avoid flickering. HUB75 works similar to a TV screen, and the content is continuously refreshed. So if we forward each "setPixelcolor" directly to the HUB75 hardware driver, it results in flickering and tearing. The HUB75 driver offers its own double-buffer solution for this, but it needs a lot more RAM for this buffer - so having a CRGB "FastLED" style buffer is actually removing flickering (all pixels pushed at once), and it reduces the RAM needed by a large amount (like 24Kb saved for one 64x64 panel).

I did not check on how the HUB75 driver uses its internal buffer(s?), the additional CRGB buffer is needed for determining the "dirty" pixels, if we could access the HUB75 driver's internal buffer for that, we would not need it. I ran quite a few experiments when writing this PR, like removing the buffer and the "dirty" bit buffer and makeing the CRGB buffer a uint32_t buffer (slight seed increase) and even using the highest byte of that to mark "dirty" pixels but came back to the way it was i.e. seperate dirty pixel buffer (1 bit per pixel) and CRGB as it yielded highest throughput at lowest RAM cost. Fastest would be: seperate dirty pixel buffer with uint32_t colorbuffer. Best for RAM use is to grant access to the busmanager's _pixels[] buffer and not use dirty.

just FYI: here are my raw notes about those experiments (translated by chatGPT):

PS Fire and Black Hole speed test:
With CRGB buffer and dirty: 48 FPS (fire) and 92 FPS (black hole)
Without CRGB buffer and without dirty: 47 FPS and 64 FPS
With uint32_t buffer (non-PSRAM) and without dirty: fire 45 FPS, black hole 63 FPS, palette 45 FPS (huh?) → Probably the loop makes access to the panel much (much) faster.

With uint32_t buffer + dirty (and show(), i.e. inside the loop):

fire: 52 FPS
black hole: 108 FPS
palette: 47 FPS (this is without color_scale!)
with color_scale it’s about 1 FPS lower
DNA: 66 FPS

With dirty stored in the high byte instead of a separate array:

black hole: 58 FPS
DNA: 46 FPS
fire: 44 FPS
→ massively worse.

With dirty in the high byte (not a separate array) and IRAMATTR on SpC: about +1–2 FPS.

With 32-bit buffer, with dirty, with color scale:

fire: 47 FPS
black hole: 102 FPS
palette: 40 FPS
DNA: 89 FPS

With else-if in color fade instead of if, if: 47 FPS / 102 FPS / 88 FPS → no change.

What if you drop all the dirty-tracking stuff and SpC entirely, and just write directly to the array inside the loop?
For that the array must be accessible → you can do it with getPixelRaw.

→ Hack with direct access to _pixels[], including scaling:

black hole: 63 FPS
DNA: 48 FPS
fire: 46 FPS
palette: 45 FPS

Conclusion:
Dirty tracking helps. The fastest is with dirty + draw loop.
So basically exactly like now, just without CRGB.
CRGB uses less RAM, and it’s not that much slower… only a few FPS.
But for sparse scenes it does make a difference (92 FPS → 108 FPS).

Maybe there’s an even better variant, but for now it’s fine as it is.

DedeHai avatar Nov 14 '25 05:11 DedeHai

@softhack007 you are missing my point. :wink:

  • With current version of 0.16 there are no calls to BusManager::getPixelColor(), hence no calls to any of bus' getPixelColor(). This doesn't mean it has to be stripped right away (about 1k of code). It's the team's decision to do that, as you say. I'm merely pointing out the fact.
  • The same goes to updating actual driver's buffer. It is only updated in the WS2812FX::show() and nowhere else. Flickering will only occur if you call show() midframe while driver is still updating LEDs.
  • IMO giving driver access to frame buffer (WS2812FX::_pixels[]) may not work as desired as realtime functions write directly into that buffer and expect show() to do proper blending. And the buffer is not properly updated until blendSegment() is finished and all realtime calls are done.
  • As for the panel, I have started to clean my desk and I do not intend to spend additional money on something I would only use for development (but I would not refuse a gift). I'm purely enjoying logical challenges associated with tough coding choices. However my code is free for anyone to use as a basis for their own improvements. That's why I'm posting here.

The last statement includes improvements to LED settings page where Hub75's GPIO are taken into account (and all LED outputs get GPIO dropdowns) and ability to select shift register via dropdown. I will not make a PR for that due to lack of time, but you are free to cherry pick and adapt as needed.

blazoncek avatar Nov 14 '25 05:11 blazoncek

For anyone interested: UI code for enabling Hub75 panels has been improved and is now working as expected (limiting dimensions to multiples of 32, allowing maximum of 6 panels, adding ability to select shift register via drop-down (currently a hack but it could be made driver-supplied), allocating GPIO in UI and preventing them to be used for other outputs).

One thing that is not clear to me from the MM or Netmindz's code is if it is allowed to have Hub matrix with dimensions of 16, 48, or 96 (in fact any multiple of 16) and if you can arrange panels in a similar way as 2D setup allows (multiple rows and columns, no UI for that in LED settings page).

blazoncek avatar Nov 17 '25 08:11 blazoncek

One thing that is not clear to me from the MM or Netmindz's code is if it is allowed to have Hub matrix with dimensions of 16, 48, or 96 (in fact any multiple of 16) and if you can arrange panels in a similar way as 2D setup allows (multiple rows and columns, no UI for that in LED settings page).

@blazoncek Thanks for your effort :-)

About the "multiple of 16" dimensions question: yes there are still 16x16 and 16x32 panels available, especially the "outdoor" ones with high brightness and pixel spacing "P10" or greater.

  • the HUB75 driver does not support mixing panels with different dimensions. ==> total width or height 16/48/80/96/112 is possible, but only when the fixture size is a multiple of one single "base panel size", and the base size is divisible by 8.

  • in our original WLED-MM code, we made our (dev) life a bit simpler by just allowing some of these panel sizes (predefined types in a drop-down), and any fixture must be a multiple of these base sizes

  • I've also seen panels with "20x40", "80x40", "96x48", and "160x80" (2-scan, 4-scan, or 8-scan) in the wild, but we never tested them - so "may or may not work".

  • Another type of panels is 8-scan (in contrast to 2-scan (indoors) and 4-scan (outdoors)). Not sure if these work in the MrCodetastic driver.

  • For WLED 2D setup to work, you must arrange your panels vertically -- like 1 (wide) x 4 (high) . Other arrangements lead to interleaving within scanlines, and our 2D setup cannot work with panels when they have interleaved (overlapping) pixel ranges.

softhack007 avatar Nov 17 '25 14:11 softhack007

Conclusion: Dirty tracking helps. The fastest is with dirty + draw loop. So basically exactly like now, just without CRGB. CRGB uses less RAM, and it’s not that much slower… only a few FPS. But for sparse scenes it does make a difference (92 FPS → 108 FPS).

Maybe there’s an even better variant, but for now it’s fine as it is.

Thanks @DedeHai for this interesting comparison :-)

It matches with my observations. The effect of dirty bits tracing is even stronger in the 0.14.x-based MM codebase (no pixel buffer on bus level) - there it leads to 2x speedup in "show". Maybe i'll do some experiments with uint32_t pixels, to check if this can speed up things even a bit more. However, it would only be an option for "power users" with S3 and octal PSRAM.


I see a few more optimizations, that were not "exploited" in MM yet:

  • if memory gets tight, the dirty buffer could also be for a group of pixels, e.g. two (or four) pixels next to each other share one "dirty" flag - example: one pixels changes, a set of 4 marked dirty at once. That's how cache coherency protocols work, which was my original inspiration for dirty bits.

  • my "dirty bits" code could even be moved up to bus level, where the final framebuffer lives, too.

    • Dirty bits would optimize sPC calls for any kind of NeoPixelBus output, especially we could avoid pushing all pixels at each "busses.show()".
    • busHUB75 could remove sPC completely, and
    • only have a special "show()" function that gets all parameters from BUS level (pixel pushing is fastest on lowest level) a) pointer to Framebuffer (start = first HUB75 pixel) b) pointer to dirty Bits buffer (start = first HUB75 pixel) c) (maybe) logical "width" of a scanline in the Framebuffer.

Edit 1: There are no "RGB+White" HUB75 panels, and most likely there never will be, because a 4th color component does not fit into the standard HUB75 16pin connector. So "True white channel" support on HUB75 is for me a purely theoretical exercise 😜

Edit 2:

if we could access the HUB75 driver's internal buffer for that, we would not need it.

that's the original problem - the MrCodetastic driver directly translates each pixel into and internal DMA buffer, which stores some kind of PWM waveform for output. This is not accessible from outside, and I doubt that we could restore the original pixel even when we had acess to the DMA buffer memory.

BTW, this internal DMA buffer is the reason why I called the driver "memory greedy" - the internal buffer for 64x64, 24 bit color is a full 32KB (sic!), about 3x the size of individual pixels.

softhack007 avatar Nov 17 '25 15:11 softhack007

about 3x the size of individual pixels

FYI it is similar with I2S in NPB. Called cadence there.

  • especially we could avoid pushing all pixels at each "busses.show()".

Not sure this would work with layered/overlapping segments. Do not forget that all segments are blended and until upstream has transparency (which should be supported in effects too) all pixels are updated during show().

blazoncek avatar Nov 17 '25 16:11 blazoncek

all pixels are updated during show().

@blazoncek I see your point. Actually, the dirty bits logic can possibly still help, at least when the framebuffer on lowest level does not get too many "intermediate" updates.

In a nutshell:

  • pixel update: if (newColor != framebuffer[pixel]) { dirty[pixel] = true; framebuffer[pixel]=newColor; }
  • show all (bus pixel pusher): if ( !dirty[pixel] ) continue; // skip if color unchanged
  • after show: all dirty[] = false;

Edit: it could imply that the "blending" and "overlaying" logic needs to be modified, so that the final pixel color is calculated ONCE (loop over all bus pixels, blend/overlay every "upper" pixel; instead of going segment-by-segment). But the result may be worth it, because you'll always find a lot of pixels that actually did not change between two frames, so its not necessary to push them down to the NPB driver again.

softhack007 avatar Nov 17 '25 19:11 softhack007

I'll leave optimizations for more competent people (or people with more time). :smile:

However, having transparent pixels would also help (to some extent).

You could have dirty flag at WS2812FX level (which would then benefit any driver) and it is rather simple to implement. Not much different than what you posted and I don't think blending would need changes (apart from comparing old vs. new). Dirty buffer can be temporary.

blazoncek avatar Nov 17 '25 19:11 blazoncek

Finally - you should try a HUB75 panel, these things are really fun 😀 .

I said I won't, but I did. 🤦‍♂️

blazoncek avatar Nov 21 '25 13:11 blazoncek

FYI I have entirely dynamic Hub75 support ready. No need for custom compiles for different boards or configurations. It even plays nicely with usermods.

blazoncek avatar Nov 25 '25 15:11 blazoncek

@softhack007 you seem most knowledgeable about Hub75 implementation in WLED. Looking at the code I can see that if _ledBuffer allocation fails the BusHub75Matrix class can work without it. However, show() serves no purpose in such case. How is then display synchronised? I.e when does it get information to display modified pixels?

blazoncek avatar Nov 26 '25 13:11 blazoncek

@softhack007 you seem most knowledgeable about Hub75 implementation in WLED. Looking at the code I can see that if _ledBuffer allocation fails the BusHub75Matrix class can work without it. However, show() serves no purpose in such case. How is then display synchronised? I.e when does it get information to display modified pixels?

Hi @blazoncek yes _ledBuffer == nullptr is a special case, call it the "limb home on a stick` mode:

  • no synchronization possible, pixels are directly put into virtualDisp->drawPixelRGB888(int16_t(x), int16_t(y), r, g, b) as they arrive in busses.setPixelColor() --> can lead to heavy flickering (i've seen the flicker during my first coding attempts)
  • busHUB75 does not know about previous pixels, so getPixelColor() returns GRAY if the "dirty" bit is set
  • its true that show does nothing in this case, ecxept for clearing the dirty pixel bits.

In fact I was thinking about removing this mode and simply set the bus driver _valid=false when _ledBuffer allocation fails. But then someone reminded me of the HD-WF2 (S3 without PSRAM) that might actually still need this fallback mode.

I don't know exactly how the HUB75 hardware driver achieves synchronisation - but when pushing all pixels in busHUB75::show(), I've never seen any sync problems. 🤔 Maybe it synchronizes implicitly on pixel (0,0), and then you just have to go fast enough to stay ahead of the "scan beam" that refreshes the display at 60-120hz.

Currently (with dirty bits) busHUB75:show() takes around 3-8ms for my 192x64 pixels test setup, so we are always ahead of the "beam" = refreshing pixels in the driver fast enough (at 250 to 120 fps) so the beam is behind us.

softhack007 avatar Nov 26 '25 16:11 softhack007

Hmm. @softhack007 have you examined new way WLED (0.16) treats output? AKA segment blending. With this approach BusXXXX::show() is needed to:

  1. limit current on digital LEDs (and start NPB)
  2. schedule LEDC PWM channels
  3. transmit network buffer

If what you describe is true, then show() is not needed for Hub75 panels as all pixels are drawn immediately prior calling show() in WS2812FX class. Consquentially _ledBufferis also not needed.

blazoncek avatar Nov 26 '25 20:11 blazoncek

Consquentially _ledBufferis also not needed.

on the contrary, it even needs a second buffer to mark "dirty" LEDs which significantly improves frame rate.

DedeHai avatar Nov 26 '25 21:11 DedeHai

on the contrary, it even needs a second buffer to mark "dirty"

Even if all pixels are redrawn in WS2812FX::show()? The only "drawback" I see in that chain is the conversion from 2D to 1D and then back to 2D (which is skipped with _ledBuffer) in Bus::setPixelColor().

However if Bus::setPixelColor() would be provided with 2D coordinates then no reverse translation would be needed. That is easy to accommodate.

blazoncek avatar Nov 26 '25 21:11 blazoncek

I'll dump this here.

https://github.com/user-attachments/assets/3321661e-0a59-4aaa-acaa-5f9313ddca01

Screenshot 2025-11-27 at 16 22 34

blazoncek avatar Nov 27 '25 15:11 blazoncek

I'll dump this here.

awesome! PR when? :)

DedeHai avatar Nov 27 '25 15:11 DedeHai

PR when?

I need someone with actual Hub75 panels to test 1st. (Until I get mine)

blazoncek avatar Nov 27 '25 16:11 blazoncek

I'll dump this here.

@blazoncek settings page looks good :-) all the options I can imagine are there, only two panel sizes still missing: 16 (high or wide), 128 (wide only)

I can't test right now, because I'll be "away from HUB75" (on business trip) for the next two weeks.

softhack007 avatar Dec 01 '25 23:12 softhack007

Even if all pixels are redrawn in WS2812FX::show()? The only "drawback" I see in that chain is the conversion from 2D to 1D and then back to 2D (which is skipped with _ledBuffer) in Bus::setPixelColor().

Hi @blazoncek the main problem is speed. And as you already noticed, the last "stone" is the repeated 1D to 2D translation. But there is more:

If we go through BusManager::setPixelColor() for each pixel, we lose time in the orchestrator for loop (even with only one bus), in the range check, in the repeated array access and function call indirection, and finally in a few nested function calls. Also, we put stress on the small CPU instruction cache, which may not be able to cope with all these indirections efficiently.

https://github.com/wled/WLED/blob/ae37f4268cbea8e714281b32321a903c5aadbe65/wled00/bus_manager.cpp#L1331-L1333

It may look like tiny amounts of CPU cycles, but considering that you only have 5ms to push 128x64 pixels out to the driver, it might become a real problem. Also you did not show us yet where to place the dirty bits array - which created a major speedup - in your proposal.

softhack007 avatar Dec 01 '25 23:12 softhack007

@softhack007 @blazoncek since HUB75 is currently pretty unusable in main without this PR: how about merging this and then take it from there with the improvements?

DedeHai avatar Dec 02 '25 05:12 DedeHai