rpi-rgb-led-matrix
rpi-rgb-led-matrix copied to clipboard
Use DPI interface as fast DMA
DPI interface is like VGA output, just binary, all 24 bits (8 per color) have own GPIO pins. Clock speed can be set above 100Mhz with about 1khz steps.
I create prof of concept project using just python and DPI output. 100 FPS on rPi 2 for 64x32 LED panel and single core load (calculations done in NumPy). At the same time, the DPI push data to panel at stable 400hz refresh rate. Theoretical maximum for 8b color is 4khz for single panel and 270hz for 32k pixels data chain mentioned in readme.
Hope my solution helps improve Your project. https://github.com/B-C-Mike/PoC-rPi-LED-matrix
Very nice! This looks like a very promising approach, as it solves the problems of (a) not nusing CPU (b) stable output frequency.
From reading the description it looks like you generate the bitplanes by clocking in the same row multiple times, which means we don't need the output enable pulse generator, but we're also limited by the clock-speed we can achieve on the serial line (or am I missing something ?). We probably won't reach 11 bit BCM with this, but in particular for smaller displays with lower PWM bits, this is still a good advantage given the CPU savings and not having to deal with jittery memory bus contentions.
Would you like to take a stab at a pull request demonstrating this with the current rpi-rgb-led-matrix code ? Don't mind if it is not clean, just rip out the parts not needed to have a working proof of concept. I can then help fitting it into the rest of the library.
Currently, the pixel setting is happening here; Could the /dev/fb1 be a memory map to write to even ? Though to support multiple buffers (swap on vsync and stuff), we might just have a backing buffer that we then swap or copy when needed.
I use both "send the same row mutiple times" and "send row with reduced brightness" to find balance between brightness and refresh speed.
No, i do use output enable and control it to get PWM brightness control. It's not PWM, rather "set this pin high for X pixels the turn low for rest". That gives good enough resolution and the brightness is reduced for some bitplanes. First 3 examples are limited to fixed brightness and 4 bit color depth, next have proper 8b color depth with PWM. My code is limited to 8 bit after gamma correction, but it should be easy to modify it to 16b and just trim to 11b.
Sadly i don't have skills for C++ code. Only C. I will try to make it work but not sure about the effect. Let's set the deadline as end of the year.
It is possible to mmap, not sure about the synchronization events and double buffer. Just make sure to enable fb1 before playing (config.txt). By default rPi maps all outputs to fb0 (mirroring) or doesn't map at all (black screen, just refresh pulses).
Ah nice, yes, using the pixel clock for timing the Output Enable sounds good.
I will play with it and see to integrate it, but I have a bunch of other projects on my plate currently, so I can't make promises that it will happen in the next couple of weeks.
Thanks a lot for your research getting a chance to have a high frequency steady clock output with the Pi, this is is exciting!
OK i have to give up. No idea what i doing with that code. Here is summary of what i learned so far:
- dpi24 overlay sets everything, including switching all GPIO to alt2 mode. Do not touch used GPIO, switch unused GIPO back to default function.
- Output framebuffer is rectangle, 32b for each pixel. Size have to match screens, pixels not set will remain 0x000000. That matters for output enable and refresh frequency.
- Dimensions can be changed via fbset, not sure about pixel frequency and horizontal/vertical blanking.
- This code writes GPIO 3 times for each word of data. Send data / flip clock pin / flip clock back. Clock for LED matrix is tied to sending data.
- DPI requires 2 writes (2 pixels of /dev/fb1) for each pack of data. Send data / send the same data but with flipped. This way DPI clock have to be twice as high as clock that LED matrix sees on CLK pin.
- For testing with any commonly used adapter the DPI offers only 24 output bits. Bits are named differently (same as GPIO but with offset 4). GPIO 0, 1, 2, 3 can't be used.
- GPIO 0-3 can be used as clock / latch by tweaking horizontal/vertical blanking time. For that usage new board is mandatory. Problem: enable pin needs inverter. All data pins goes low during blanking. Matrix expects high to disable output.
- Extra feature. There might be a possibility to drive two strings of panels using one output and inverting clock. Already write bout that on discourse.group
I think it is possible to split single data chain into multiple channels. Imagine i send the data like AAaaBBbbCCccDDdd where capitalization reflects clock signal and ABCD reflects 4 pixels send to the panels. 16 pixels of framebuffer, now clocking at 140Mhz to get 35Mhz clock output. AabBCcdDEefFGghH would be the pattern to feed 2 panels at the same time. One panel will recieve ACEG pixels at falling edge of the clock signal. Second panel will receive inverted clock signal (just NOT gate) and grab BDFH pixels. Still 35Mhz base clock for the panels. More can be achieved with small and cheap FPGA/CPLD as middle layer, assuming panel timing is fixed and we just transfer image data, no control bits.
This library is already using DPI output and I am successfully running large display matrix using it. https://github.com/rjrouquette/rgb_matrix_udp
After going through your description looks like you are setting the DPI clock at 40 Mhz. But most of the RGB Panels have driver IC's with 25 Mhz clock. So for larger displays it will not work.
@arahasya
Thanks for link to that project. It is possible that other projects like my idea exists too, just complete. So far never found any. btw, linnked project requires atxmega microcontroller and programmer for it. That's the main difference between these two idas.
No. I don't use DPI pixel clock. I generate clock via one of data lines. 40Mhz is the reference clock (pixel clock) for DPI, not for panel. Panel is clocked at half of that speed, by switching one of data pins as fast as possible. I had problems latching panel data when clock is still running, so i stop clock, then latch. DPI clock can't be stopped, software generated can be stopped.
@B-C-Mike just mentioned the link if that can be helpful for your implementation.
Your project looks very promising like as you said don't need an external controller. Even tough the above project enables 4 parallel chains your project will improve this current library with 3 parallel chains and make it possible to run smoothly bigger display. SO I will be looking forward to your contribution
We probably won't reach 11-bit BCM with this, but in particular for smaller displays with lower PWM bits, this is still a good advantage given the CPU savings and not having to deal with jittery memory bus contentions.
Why not? The LEDs are good for only 11-13 bits; however, this assumes a single scan panel. If you use a 32-scan panel you are only looking at 6-8 bits. You now use the GPU and L2 cache as a FIFO, instead of the CPU and L1 as a FIFO, if I am not mistaken. The L2 likely has priority over the L1s so this works out.
There are software algorithms which would support getting the full amount as long as there is enough bandwidth, so what changed? Now the CPU required for this is a tad higher so how much CPU you get back is not completely clear to me. However, should be some, especially for non-real time play back using multibuffering.
Meaning this library has issues with large, high-quality displays due to bit banging. However, it should support this in non-real time? Is there much that can be done about the real time performance without additional hardware? DPI should provide better stability. Is this library okay with consuming the entire header?
There is some overhead, you should be able to get most of the serial bandwidth. You could still use BCM, but you need a decent number of memory operations from CPU.