rpi-rgb-led-matrix icon indicating copy to clipboard operation
rpi-rgb-led-matrix copied to clipboard

Getting high refresh rates on a 192x128 matrix

Open carlosalaniz opened this issue 4 years ago • 29 comments

I'm driving a 192x128 matrix, however, the RPI can only go up to 60 ~ 100hz when playing an animation.

Chinese controllers can achieve high refresh rates 960hz+, however, these cards do not provide a good interface to be controlled programmatically, I need to be able to control individidual pixels on the matrix, stream video, among other things, that's why I'm looking into this library, anything that you would recommend to accomplish this at high refresh rates?

Thank you!

carlosalaniz avatar Oct 29 '19 19:10 carlosalaniz

You need to give a little more detail about your setup.

What are the panels used, how many parallel and chains. How you do the animation? pwm depth? What are the parameters you give to the library? What is the multiplexing? are you using pwm dither?

hzeller avatar Oct 29 '19 20:10 hzeller

@carlosalaniz These chinese controllers use FPGAs to achieve high refresh rates, they are doing the whole refresh in hardware. I do not know if one Pi is fast enough for such a big panel at these rates, if it isn't you could look into using a FPGA Hat for the Pi.

Cellgalvano avatar Nov 24 '19 17:11 Cellgalvano

@carlosalaniz @Cellgalvano it just so happens that I also setup a 192x128 matrix, and it was ok enough in some setups. ~/rpi-rgb-led-matrix/examples-api-use/demo --led-gpio-mapping=regular --led-rows=64 --led-cols=128 --led-row-addr-type=0 --led-chain=3 --led-show-refresh --led-slowdown-gpio=1 --led-parallel=1 --led-pwm-dither-bits=1 --led-pwm-lsb-nanoseconds=9 --led-pwm-bits=4 -D0 gives 390Hz, but --led-pwm-bits=4 isn't great

~/rpi-rgb-led-matrix/examples-api-use/demo --led-gpio-mapping=regular --led-rows=64 --led-cols=128 --led-row-addr-type=0 --led-chain=1 --led-show-refresh --led-slowdown-gpio=1 --led-parallel=3 --led-pwm-dither-bits=1 --led-pwm-lsb-nanoseconds=9 --led-pwm-bits=7 --led-panel-type=FM6126A -D10 aso gives 350Hz or so but with 7 bits of color (note that I had to patch the library to allow --led-pwm-lsb-nanoseconds=9 which gives a small boost, not sure if @hzeller would take a patch to remove the restriction that currently restricts the speed to 50ns). Note that this 2nd invocation uses the parallel board with 3 outputs.

That was with ABCDE panels. @hzeller was telling me something about AB (or AC) panels being quicker than ABCDE panels. I was not sure how that is the case, since with 2 address lines, you have to feed 8 times more pixels per line, vs feeding 8 times fewer over 8 times more addresses. I've noticed that 64x64 with 2x ABCD 64x32 panels where you shift 64 pixels per line is slightly faster than a single ABCDE panel with one more address line but you don't feed as many pixels, but not by much. @hzeller can you explain how AB/AC panels are supposed to be much quicker than ABCDE panels? I do need this for an upcoming project that will use 128x192 and would of course like to get the best panels with the best refresh rate if possible.

marcmerlin avatar Dec 30 '19 10:12 marcmerlin

Yes, panels with ABCDE are super-slow as they have to do the relatively long PWM cycle 32 times. It is faster clocking in double the LEDs in a row and have half the multiplexing.

Look for 'outdoor' panels with 1:4 or 1:8 multiplexing. They are bright and fast.

What you did is good: always use as many parallel chains as possible, as it doesn't cost more to operate more chains, but the shorter they are, the faster they can go.

Don't go too low with the nanosconds speed, the panels can't switch that fast and the cabeling would also not properly transfer that pulse. Also, it doesn't buy you as much as the on-time is used to clock in the data in parallel; ultimately, you'd just be blocked by the clocking. Lower than 50ns has created issues in the past, so this is why it is limited; you're of course free to set this if it works for you.

Your invocations don't looks like you run them as root; might just be left out in the message, but if not, always run as root; this is required for hardware timing, proper realtime threads etc.

You can try this by looking what speeds you'd get if you give --led-multiplexing=1 (the output on your panel will not be useful, but you'd see the speed).

hzeller avatar Dec 31 '19 20:12 hzeller

Hi @hzeller thanks for the tips. Yes, I noticed that --led-pwm-lsb-nanoseconds=9 makes brightness go down, I just took it as long as it would reliably work on my panels. It does not increase the speed by a lot, but still does give some extra Hz. Running as root: I definitely run as root, but thanks for making sure that I indeed do. --led-multiplexing=1 does not make things slower or faster. --led-row-addr-type=3 actually makes things slower, which goes against what you were saying (245Hz with vs 310Hz or so without)

oot@rPi3:~/rpi-rgb-led-matrix# ~/rpi-rgb-led-matrix/examples-api-use/demo --led-gpio-mapping=regular --led-rows=64 --led-cols=128 --led-row-addr-type=0 --led-chain=1 --led-show-refresh --led-slowdown-gpio=1 --led-parallel=3 --led-pwm-dither-bits=1 --led-pwm-lsb-nanoseconds=9 --led-pwm-bits=7 --led-panel-type=FM6126A -D10 --led-row-addr-type=3

I tried chain=3 parallel=1 for speed testing

root@rPi3:~/rpi-rgb-led-matrix# ~/rpi-rgb-led-matrix/examples-api-use/demo --led-gpio-mapping=regular --led-rows=64 --led-cols=128 --led-row-addr-type=0 --led-chain=3 --led-show-refresh --led-slowdown-gpio=1  --led-pwm-dither-bits=1 --led-pwm-lsb-nanoseconds=9 --led-pwm-bits=7 --led-panel-type=FM6126A -D10 --led-row-addr-type=3

gives 100Hz with type=3 and 110Hz with the default type. Does this make any sense to you? For now, this tells me that AC panels will actually be slower than ABCDE panels :-/

marcmerlin avatar Jan 01 '20 21:01 marcmerlin

@hzeller any idea why AC seems slower than ABCDE in the tests I just posted above? Can you reproduce using the same command lines as the ones I posted?

marcmerlin avatar Jan 08 '20 05:01 marcmerlin

I suspect because you are cheating with the very short pwm nanoseconds, so the clock-times dominate the timings.

hzeller avatar Jan 08 '20 05:01 hzeller

@hzeller ok, let's take that --led-pwm-lsb-nanoseconds=9 and try -led-row-addr-type=3 in parallel or not. It's about 100Hz either way (huge slowdown)

root@rPi3:~# ~/rpi-rgb-led-matrix/examples-api-use/demo --led-gpio-mapping=regular --led-rows=64 --led-cols=128  --led-chain=1 --led-show-refresh --led-slowdown-gpio=1 --led-parallel=3 --led-pwm-dither-bits=1  --led-pwm-bits=7 --led-panel-type=FM6126A -D10 --led-row-addr-type=3
81Hz
root@rPi3:~# ~/rpi-rgb-led-matrix/examples-api-use/demo --led-gpio-mapping=regular --led-rows=64 --led-cols=128 --led-chain=3 --led-show-refresh --led-slowdown-gpio=1  --led-pwm-dither-bits=1 --led-pwm-bits=7 --led-panel-type=FM6126A -D10 --led-row-addr-type=3
122hz max

Now, let's do a regular ABCDE panel, and it's still faster, 133hz

root@rPi3:~#  ~/rpi-rgb-led-matrix/examples-api-use/demo --led-gpio-mapping=regular --led-rows=64 --led-cols=128 --led-row-addr-type=0 --led-chain=3 --led-show-refresh --led-slowdown-gpio=1  --led-pwm-dither-bits=1 --led-pwm-bits=7 --led-panel-type=FM6126A -D10
133hz

Are you able to reproduce this? If so, would you agree that ABCDE panels seem faster than AC panels at least the way that your driver is using them?

marcmerlin avatar Jan 19 '20 06:01 marcmerlin

it is the panels with different multiplexing that make things faster, ABCD instead of ABCDE (64 row panel), or AB instead instead of ABC (16 row panels). That is what outdoor panels do.

It is not the row address type you are testing here. That is just another way of doing the same 1:32 multiplexing so will not have much of a difference.

hzeller avatar Jan 19 '20 07:01 hzeller

@hzeller Ah, thanks for clearing that up. So basically ABC panels are no better than ABCDE panels, gotcha. Now I need to see if I can find AB panels that do 128x64, except it seems worse in the test I just did.

Just to be clear, --led-row-addr-type=3 is ABC --led-row-addr-type=1 is AB correct?

~/rpi-rgb-led-matrix/examples-api-use/demo --led-gpio-mapping=regular --led-rows=64 --led-cols=128  --led-chain=1 --led-show-refresh --led-slowdown-gpio=1 --led-parallel=3 --led-pwm-dither-bits=1  --led-pwm-bits=7 --led-panel-type=FM6126A -D10 --led-row-addr-type=1

is giving me only 94Hz for AB, which is even worse.

Am I doing something wrong? Given that this can be tested without the actual panels, are you able to verify this from your side?

marcmerlin avatar Jan 19 '20 18:01 marcmerlin

As I said, --led-multiplexing is the relevant factor, not --led-row-addr-type. The former decides what kind of multiplexing is done (and the layout within), which is in the ABCDE case 1:32. This needs to be less.

You need to look for so-called 'outdoor' panels, as they usually have a smaller multiplexing (and also thus are typically faster). However, I have seen 32x32 panels with outdoor, but not 64x64 yet. So it might be worthwhile looking if a longer chain with 32x32 outdoor panels can give any advantage.

The row-addr-type is only determining the way the multiplex address is given to the panel; it will not change much if you have a 1:32 panel.

Also: To compare times, I'd always start out with the defaults for pwm-bits and lsb-nanoseconds and then compare how that changes things when you tweak them (because you ideally would like to have a fast thing to start with and then tweak it later). Right now, you are setting them to extremes that are not really useful in real world but skew your comparisons as they drive things into 100% CPU.

hzeller avatar Jan 19 '20 20:01 hzeller

@Cellgalvano you mention FPGA solutions which I know about, but I've never actually found the maximum number of pixels, or aggregate solution you can run from an FPGA. From what I understand, it's not just about pushing pixels faster, ultimately if you refresh too quickly, lines will become dimmer as they are cycled too quickly. Do you happen to have numbers for ABCD(E) panels vs AB panels? @hzeller or maybe you also happen to know if rPi is ultimately already as fast as it can given the technology, and 256x256 is probably a reasonable limit if you want a 100Hz refresh rate, or whether FGPAs can go much higher somehow?

marcmerlin avatar Jan 24 '20 19:01 marcmerlin

rpi is actually too fast for the panels which is why we have to slowdown gpio. I have something brewing which allows to multiplex more chains, but it is a project whenever-there-is-time, so don't hold your breath.

With FPGA's, you will always get better results as you have more io to connect more parallel chains. Clocking speed will not be much different (matrix limit), but easier to tune to optimum. Also they will be brighter as there is some dead-time in the timing circuit in the PI.

hzeller avatar Jan 24 '20 20:01 hzeller

Thanks @hzeller for the details.
I did some quick benchmarking with configuring AB panels I don't have, but as you said, I should be able to benchmark this before getting some, by running the output as if I were plugged in.

I think it could be useful to post some benchmark numbers to give people an expectation of maximum display sizes they can expect now that getting 256x256 is actually very achievable hardware-wise.

Did you do any blind benchmarking yourself of what's the maximum resolution that can be reasonably driven from an rPi3 while still keeping a refresh rate of 100Hz? (or whatever number you deem acceptable).

If not, I did some below. Here's my test for 128x192 on 3 channels without the performance increasing options:

demo --led-rows=64 --led-cols=128  --led-chain=1 --led-show-refresh --led-parallel=3 -D10

gives 100Hz

Adding --led-row-addr-type=1 (for AB) keeps the speed the same, as you hinted.

Using --led-multiplexing=1-4 increases refresh to 160-180Hz, so that's a sizeable increase, but that increase goes away if I add "tuning options" (which indeed do decrease the quality, but are necessary to get a refresh rate that's high enough).

Once I add "--led-pwm-dither-bits=1 --led-pwm-bits=7" then the speed is 200Hz vs 320Hz for AB vs ABCDE

Adding --led-pwm-lsb-nanoseconds=9 then gives me around 430Hz in both cases of multiplexing.

AB panels are still a bit faster, but not anymore if you push --led-pwm-lsb-nanoseconds to a value is arguably somewhat undesirable as the panels get a bit too dim.
Back to a more reasonable number (25) the difference is really just 10Hz between the 2:

demo --led-rows=64 --led-cols=128  --led-chain=1 --led-show-refresh --led-parallel=3 -D10  --led-row-addr-type=1 --led-pwm-dither-bits=1 --led-pwm-bits=7 --led-pwm-lsb-nanoseconds=25 [--led-multiplexing=4]

Hope this helps someone. Looking forward to the first person to build a 256x256 and say how bad it looks, although ultimately, as you say, a lot of parallel strings with an FPGA (or multiple), is the only way to go if you want to go higher res and keep quality (seems that 128x128 at high quality is already too big for a single string).

marcmerlin avatar Jan 24 '20 20:01 marcmerlin

There you go, made a README patch: https://github.com/hzeller/rpi-rgb-led-matrix/pull/971

marcmerlin avatar Jan 24 '20 20:01 marcmerlin

My first guess would be to use the tipps and tricks from the previous posts. If you reached the limits of the pi you could look into the beaglebone project octoscroller or try to get a different firmware for your chinese controller board. Most of these boards feature FPGAs. Some boards like the ColorLight 5A-75B are well documented.

Cellgalvano avatar Aug 07 '20 18:08 Cellgalvano

If that inspires anyone, just finished the frame for my 384x256 array image

While you can go a bit farther (384x320, or if you really push 384x384), those are probably the actual limits for the current library code. I'm sure the panels will go faster with custom FPGAs, but as others have said, I haven't found any cheap FPGA that takes easy input you can generate from an arduino/Pi-like chip.

Build info: http://marc.merlins.org/perso/arduino/post_2020-03-13_RGB-Panels_-from-192x80_-to-384x192_-to-384x256-and-maybe-not-much-beyond.html

marcmerlin avatar Aug 08 '20 21:08 marcmerlin

One small board that is definitely capable enough to power one chain is the MAX1000 or CYC1000. At such a low cost per board probably one per chain would be feasible. These feature 8MB of SDRAM. I tried to implement a simple driver for these, the only downside is the relatively low numbers of io pins.

Cellgalvano avatar Aug 09 '20 09:08 Cellgalvano

I tried using the external ram for some simple tests, but just as you described I also favor the internal BlockRAM, because I only need to wait for one additional clock cycle while reading and my panel setups are pretty small (32x128 and 64x128). The CYC1000 features 594 kilobits of BlockRAM.

I also tried to do some offloading using a cheap EPM240 CPLD in combination with an asynchronous SRAM, but I could just barely fit my logic and it did not work. Probably a EPM570 will do the trick.

For external RAM I also got some of these IPS6404 QSPI PSRAM ICs laying around which could be interesting because they provide 8MB of RAM with only 6 io pins required.

Cellgalvano avatar Aug 09 '20 10:08 Cellgalvano

@daveythacher you were asking CPU use. rpi3 driving 30720 pixels over 3 chains: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
18443 daemon 20 0 25308 4648 3508 R 199.7 0.5 1055:05 Table_Mark_Este

rip4 driving 98304 pixels over 3 chains: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2236 daemon 20 0 27340 6664 3492 R 142.9 0.3 891:14.61 Table_Mark_Este

as for FPGAs, honeslty if they can't drive 30K pixels or more per chain at let's say 150Hz, then they're not really interesting since an rPi4 can do it at 100Hz or so over 3 chains.

marcmerlin avatar Aug 09 '20 16:08 marcmerlin

@daveythacher 128x32 at 95fps is just not that interesting, right now I'm doing 128x256 (8 times more) at a 7 bits per pixel with almost the same refresh rate (just a bit faster).

            defaults.rows = 64;
            defaults.cols = 128;
            defaults.chain_length = 4;
            defaults.parallel = 3;
            defaults.pwm_lsb_nanoseconds = 100;
            defaults.pwm_bits = 7;

To make FPGAs worth it, they ought to be at least 1.5 or 2x faster than a single channel fro the rPi.

marcmerlin avatar Aug 10 '20 16:08 marcmerlin

@daveythacher , if I may, too many words, it makes it a bit harder to follow/keep up with everything you're writing :) To answer your question on Hz at 24bpp or 8 pwm/bcm bits:

root@rPi4b:~# ~/rpi-rgb-led-matrix/examples-api-use/demo --led-gpio-mapping=regular 
--led-rows=64 --led-cols=128  --led-row-addr-type=0 --led-chain=4 --led-parallel=3 -
-led-show-refresh --led-slowdown-gpio=2   --led-pwm-bits=8 --led-panel-type=FM6126A  
--led-pwm-dither-bits=1 --led-pwm-lsb-nanoseconds=50 -D4
Size: 512x192. Hardware gpio mapping: regular
Press <CTRL-C> to exit and reset LEDs                        98.0Hz

So, yes, I get about 100Hz for 128x256 (led chain 4).

I can't say why it doesn't quite align with the math you posted, but it sounds that basically it's already going as fast or even a bit faster than expected, and therefore I should expect no faster from an FPGA.

Note that this is also on 3 different channels in parallel, so I'm not even sure if I can get and FPGA that does 3 in parallel, 32K pxels per string, and for much less than $40 while taking framebuffer input over another fast channel so that I can push it from a microcontroller. If such a thing exists, with code that is already written, then I'm super interested. If it's just theoretical for now, then it's nice to know it's possible, but until I can buy one and upload the code, it's not super useful to me since I focus on the higher level code that runs on top :)

marcmerlin avatar Aug 11 '20 15:08 marcmerlin

@daveythacher , I'll skip over the questionable accusations

  • --led-show-refresh shows Hz output
  • chain is 4x 128x64 (128x256) on 3 channels, which is 384x256. Ignore the size output by the lib, it's because it has a virtual mapping and pixels get remapped before display.
  • FPGAs that can do it, great, but as I said, until I see one with working code for all my panels, it's just theoretical to me :)

marcmerlin avatar Aug 11 '20 16:08 marcmerlin

@daveythacher I think the best way to find out, is for you to build it, test it, and report back your findings. Then you can also build the FPGA solution you've been writing about, compare both, and us know your results.

marcmerlin avatar Aug 13 '20 17:08 marcmerlin

@daveythacher regarding the ColorLight Boards I'm only following the http://github.com/q3k/chubby75 project. I never thought about using the original firmware, so I can't provide additional information. Sorry.

Cellgalvano avatar Aug 17 '20 22:08 Cellgalvano

@daveythacher regarding the ColorLight Boards I'm only following the http://github.com/q3k/chubby75 project. I never thought about using the original firmware, so I can't provide additional information. Sorry.

It looks like @daveythacher figured out how to use the original firmware.

greatballoflazers avatar Oct 29 '21 03:10 greatballoflazers

To answer the question, getting refreshes above 200Hz will be hard at 192x128. Quality will be an issue also. The reason for this is simple, this library uses traditional PWM. Some panels support high refreshes while others do not. FPGAs are used in receiver cards to increase the number of chains possible which increases the serial bandwidth.

Receiver cards also implement another mode called S-PWM. It allows very high refreshes depending on chain length. This is very different from traditional PWM. This more complicated but also generally allows for higher quality also. The Pi could support this however it is only useful for offline computation. This is somewhat involved to properly configure this.

Another option available are SRAM panels, but these are also not supported. The idea that increasing the refresh lowers the brightness, sounds like a configuration problem with this implementation. This can happen, however this may be due to other things. Note there is a limit to how high the refresh can go.

Note the quality is generally at most 11-bits for the LED and increasing the multiplex lowers this. You need to make modifications to the code to properly align this. FPGAs are very fast and drive the panels more efficiently than the RPI can, however the cost and complexity is not likely to be worth it. There are specific cases when this is helpful. The most helpful is determinism of the refresh.

In summary you could be asking a bit much of traditional PWM. Lowering the PWM bits will increase the refresh. Adding logic for S-PWM will also increase the refresh without lowering the quality on standard panels. Other than that get a bunch of ESP32s and SRAM panels. Or use a receiver card in net card mode.

There is a lot of application information for these panels that is not mentioned here. However you do not need a FPGA unless you want high quality and/or determinism.

bluelasers avatar Mar 08 '22 20:03 bluelasers

This code base is designed for low refresh. It can in theory do high refresh with quality and display size reductions. However there will need to be special provisions for ghosting which are not supported by this code base to the best of my knowledge. These provisions are only useful on specific panels. Most panels supported by this project are not designed for high refresh. These panels generally favor color depth over refresh and some favor size over color depth and refresh.

I did some testing an it looks like the size of the display and quality impact stability. The interesting thing is how this code base handles color depth. It assumes 11 bit color depth regardless of multiplex and drops the least significant bits to make more time. This actually creates slight imperfections in accuracy and period times. The effect of this reduces the refresh rate and increases it at the same time. The most significant bits carry the most value but they also carry the most time.

Randomly changing the LSB time will increase the refresh but this will also create errors. There are a number of different things which can affect brightness. I will not take the time to mention that. However it is important to protect the PWM period. This library does not do this and this can be overlooked to a degree. Correcting this in the code would not be very hard if you know what your doing.

There is a discussion for using different LED drivers which by the look of things could be promising for this. However it does not look like any progress has been made. I would say for the time being this project is not rated for high refresh. That being said if you know how to modify certain parameters in the code base you should be able to get a high refresh rate by lowering the color depth. It is believed the stability may be passable for this 192x128, but that is just a hunch.

newcokesucks avatar Oct 17 '22 22:10 newcokesucks

bluelasers Joined GitHub on March 8, 2022. No other code written, no other project, no contributions. newcokesucks Joined GitHub on September 12, 2022 just in time to fork this project and send a CL. No other code written, no other project, no contributions.

If any of you isn't somehow another account for David Tacher, please write your fork with all your ideas and suggestions, and like Linus would say "show us the code".

marcmerlin avatar May 10 '23 15:05 marcmerlin