Adafruit-GFX-Library icon indicating copy to clipboard operation
Adafruit-GFX-Library copied to clipboard

Changing `drawRGBBitmap(..., uint16_t *,...)` to virtual enables significant performance gains.

Open MHotchin opened this issue 5 years ago • 3 comments

Issue type: enhancement

  • Board: Wemos D1 R32, Mega 2560

If the method drawRGBBitmap(int16_t x, int16_t y, uint16_t *bitmap, int16_t w, int16_t h) is changed to virtual, this allows significant performance gains for any display board that supports a 'bulk transfer' mechanism, on the order of 90% or better.

Code compatibility should be 100%. Implementation is simple, just move the method declaration up to the 'virtual' section. Drivers that do nothing and use the existing implementation have very little change.

I have included a sketch that demonstrates this. It creates a 16x16 tile, then uses it to cover a 256x256 board - a total of 64K pixels written.

I ran the test first using the original code, then just changing the method to virtual, then using an over-ride, optimized version.


Using a Wemos D1 R32 + Waveshare 4" screen, I get the following times:
//
//  Using a Wemos D1 R32 + Waveshare 4" TFT screen.  ESP32 board, SPI ILI9486 display
//
//  16x16 tiles
//  Render Times:
//  Original  = 1261 mSec
//  Virtual   = 1261 mSec = +0%
//  Optimized = 124 mSec  = -90.1%
//
//  Sizes (Program / Min RAM)
//  Original  = 225,533 bytes / 16036 bytes 
//  Virtual   = 225,541 bytes / 16036 bytes
//  Optimized = 226,029 bytes / 16036 bytes

Using a Mega 2560 (plus same display), I get the following:
//
//  Using Mega 2560 + Waveshare 4" TT screen
//
//  16x16 tiles
//  Render Times:
//  Original  = 6229 mSec
//  Virtual   = 6315 mSec = +1.3%
//  Optimized = 298 mSec  = -95.2%
//
//  Sizes (Program / Min RAM)
//  Original  = 11,232 bytes / 306 bytes 
//  Virtual   = 11,392 bytes / 308 bytes
//  Optimized = 11,632 bytes / 308 bytes
//

Results for 32x32 tiles are similar, with slightly higher savings in time.


The other `drawRGBBitmap()` methods need not be virtual, since each one does some processing on each pixel before writing it - there's no opportunity for bulk transfer.
Complete text of sketch follows:
/*
 Name:		BlitTest.ino
 Created:	2020-07-11 7:51:36 PM
 Author:	Michael
*/


#include <Arduino.h>
#include <SPI.h>
#include <Adafruit_GFX.h>

#include <Waveshare_ILI9486.h>

namespace
{
	Waveshare_ILI9486 MyTFT;

	constexpr size_t BMP_WIDTH = 16;
	constexpr size_t BMP_HEIGHT = 16;

	constexpr size_t NUM_PIXELS = BMP_HEIGHT * BMP_WIDTH;

	constexpr size_t MAX_WIDTH = 256;
	constexpr size_t MAX_HEIGHT = 256;

	constexpr size_t TILES_WIDTH = MAX_WIDTH / BMP_WIDTH;
	constexpr size_t TILES_HEIGHT = MAX_HEIGHT / BMP_HEIGHT;
}


// the setup function runs once when you press reset or power the board
void setup() 
{
	Serial.begin(115200);

	SPI.begin();

	MyTFT.begin();
	MyTFT.fillScreen(0);
}

// the loop function runs over and over again until power down or reset
void loop() 
{
	uint16_t buffer[NUM_PIXELS];

	//  Fill buffer with random pixels
	for (size_t i = 0; i < NUM_PIXELS; i++)
	{
		buffer[i] = random(UINT16_MAX);
	}

	auto tStart = millis();

	for (size_t i = 0; i < TILES_HEIGHT; i++)
	{
		for (size_t j = 0; j < TILES_WIDTH; j++)
		{
			MyTFT.drawRGBBitmap(j * BMP_WIDTH, i * BMP_HEIGHT, buffer, BMP_WIDTH, BMP_HEIGHT);
		}
	}

	auto tEnd = millis();

	Serial.print("Render time: ");
	Serial.print(tEnd - tStart);
	Serial.println(" mSec.");

	delay(3000);
}

//
//  Using a Wemos D1 R32 + Waveshare 4" TFT screen.  ESP32 board, SPI ILI9486 display
//
//  16x16 tiles
//  Render Times:
//  Original  = 1261 mSec
//  Virtual   = 1261 mSec = +0%
//  Optimized = 124 mSec  = -90.1%
//
//  Sizes (Program / Min RAM)
//  Original  = 225,533 bytes / 16036 bytes 
//  Virtual   = 225,541 bytes / 16036 bytes
//  Optimized = 226,029 bytes / 16036 bytes
//
//
//
//  32x32 tiles:
//  Render Times:
//  Original  = 1259 mSec
//  Optimized = 120 mSec = -90.4%


//
//  Using Mega 2560 + Waveshare 4" TT screen
//
//  16x16 tiles
//  Render Times:
//  Original  = 6229 mSec
//  Virtual   = 6315 mSec = +1.3%
//  Optimized = 298 mSec  = -95.2%
//
//  Sizes (Program / Min RAM)
//  Original  = 11,232 bytes / 306 bytes 
//  Virtual   = 11,392 bytes / 308 bytes
//  Optimized = 11,632 bytes / 308 bytes
//
//
//
//  32x32 tiles
//  Render times:
//  Original  = 6225 mSec
//  Virtual   = 6305 mSec = +1.2%
//  Optimized = 279 mSec  = -95.5%

MHotchin avatar Jul 12 '20 05:07 MHotchin

The render times seem long even when optimised. What SPI clock rate are you using? Are you sure the ESP32 is not bit bashing the SPI lines?

I get 31ms for 16x16 with ESP32 at 40MHz SPI using an ST7796 320x480 display...

Bodmer avatar Aug 10 '20 21:08 Bodmer

The render times seem long even when optimised. What SPI clock rate are you using? Are you sure the ESP32 is not bit bashing the SPI lines?

I get 31ms for 16x16 with ESP32 at 40MHz SPI using an ST7796 320x480 display...

The display is limited to 20 MHz SPI (it's one of those serial to parallel shift register thingies), so there's half the problem right there. The SPI implementation I'm not sure about, I'm just using the default.

Regardless, even the 'long' optimized version is 10x faster....

MHotchin avatar Aug 10 '20 23:08 MHotchin

Using a different SPI transfer function, I'm now at 61 ms, a 95% improvement over the default implementation.

MHotchin avatar Aug 22 '20 03:08 MHotchin