pimoroni-pico
pimoroni-pico copied to clipboard
ST7789::write_blocking_parallel could use DMA?
Hello, I was noticing that a display.update() on the Tufty2040 was taking about 30ms to send a full size RGB565 framebuffer to the display.
When investigating, I noticed that forcing the PIO's clock divider to 2 or even 1 made no impact on the update time and that the CPU was never having to wait for space in the FIFO.
I also noticed that there already was some code in the parallel version of the constructor to configure a DMA channel, but it's then not used for all data transfers.
Replacing the existing initial transfer loop at the start of write_blocking_parallel with the following sped up display.update() to about 9.8ms!
void ST7789::write_blocking_parallel(const uint8_t *src, size_t len) {
write_blocking_dma(src, len);
dma_channel_wait_for_finish_blocking(st_dma);
// rest of function unchanged
It does seem necessary to keep the subsequent loop that polls the stall mask: if I remove it, I see a handful of non-updating pixels at the bottom right of the display. (I guess removing it causes CS to be deasserted before the PIO has finished sending every byte and so the display doesn't get the last few bytes)
Good spot. Just tested this and went from 29.55ms to 9.88ms for RGB565 pens. Pretty much exactly what you found.
I've raised a PR with a fix, though I've swapped the ugly stall mask check for a loop polling pio_sm_is_tx_fifo_empty. This might introduce a race on the very last pixel, but my tests didn't show up anything anomalous.
Since you're tinkering with Tufty, what do you make of https://github.com/pimoroni/pimoroni-pico/issues/567?