Pillow Performance of ImageDraw::text() and potential use of FTC

What did you do?

Used Pillow to render frames outputting to ffmpeg - in project https://github.com/time4tea/gopro-dashboard-overlay Pillow is great!

I'm trying to render frames as quickly as possible, as there are many frames to render in a 1 or 2 hour video - even at 10 frames/second

I'm using the text facilities of Pillow to render text into an Image. I cache text images where possible - so rendering fixed text strings is very quick - however, with a dynamic text string, such as a datetime or GPS location - caching isn't so effective.

Looking at the call stack of drawing some text.. it seems to look something like:

ImageDraw::draw_text()
  ImageFont::getmask2()
    Font::getsize() - implemented in imagingft.c font_getsize
    Font::render() - implemented in imagingft.c font_render

When you call these functions a lot - as I do - it becomes clear that these functions probably do a lot of similar work - in a python profile of a run of my software (there are multiple call routes here so don't worry they don't all add up!)

draw_text -> 2077 calls 8259ms
  getmask2 -> 2076 calls 7567ms
    Font::getsize -> 2595 calls 4171ms
    Font::render -> 2076 calls 4195ms

Looking at imagingft.c, - they both seem to call (in my case) text_layout_raqm, which, I'm guessing calls through to FT to get the glyphs for the given string - allowing for ligatures/kerning etc.

I was wondering... FT seems to allow for glyph caching using FTC_Manager - is there any appetite for adding support for this?

I think that, in the case of rendering lots of frames of text, it has the possibility of adding quite a bit of performance. (Which is probably not a major goal for Pillow, totally fair!)

For example, rendering a compass widget using Pillow, with a few open and filled circles, lots of compass lines, and bilinear resize for AA takes about 2.6ms, but when adding in 4 characters for "N", "S","E", "W" - takes 13ms. (I could optimise this particular use case, its just an example of how the text rendering compares to the rest of Pillow)

Thanks for reading this far! Thanks for a super library!

Sep 24 '22 21:09 time4tea

I have another suggestion (in addition to glyph caching).

The function getmask2 performs the following steps:

calls getsize to get the size of the text
calls Image._decompression_bomb_check to compare size with MAX_IMAGE_PIXELS
calls the fill function passed as argument to create a blank image
calls render to draw text into the blank image

After Pillow 10 the deprecated fill parameter will be replaced by a direct call to the internal function. After this, the only Python function to be called between the two C functions is the decompression bomb check. If this was moved into C, the two functions could be combined to remove the duplicate call to text_layout.

Sep 25 '22 00:09 nulano

So - while it is a million miles from being ready for a library - i have some PoC code here using FreeType Cache from python. If it is useful ... It almost certainly leaks memory, and will SEGV occasionally at the moment. It renders the font into an ImageDraw in python, so that bit is quite slow Layout is basically non-existant. Only will build on linux, and even then only with GCC.

https://github.com/time4tea/gopro-dashboard-overlay/blob/c_extension/gopro_overlay/freetype.py https://github.com/time4tea/gopro-dashboard-overlay/blob/c_extension/c/freetype.c https://github.com/time4tea/gopro-dashboard-overlay/blob/c_extension/setup.py

Sep 30 '22 22:09 time4tea

The code above, although still very(!) rough - is showing some interesting results so far. It is definitely not comparing apples to apples. but the performance so far makes me think it might be worth pursuing. For example, to render some string into an RGBA image takes about 6ms for current pillow, but using FreeType cache, takes 40us -> its about 140x faster. To render a stroked thing into an RGBA image takes about 14ms for current pillow, using cache takes 1.2ms -> its about 11x faster. Like i say, its not a fair comparison, the pillow stuff is doing a lot more, but also given the difference, makes me think i might plod on a bit. Here is an example of the output - top is new thing, bottom is pillow.

Oct 08 '22 13:10 time4tea

It looks like the font rendering has got much faster in recent releases! - I was on 8.4.0 - upgrading to 9.2.0 speeds up my test case for pillow from 14ms to 4ms.

I think fixing #6649 would significantly improve the performance of text rendering.

I'm at a point where its basically working now - have a look at the above files if you're interested.

This is the current timing for my experiment - time to render the string in the below image.

Cache - Stroked
  1.95 msec
Cache - Plain
  346 usec
Pillow - Stroked
  4.22 msec
Pillow - Plain
  4.13 msec

Here you can see some strings rendered by Pillow and my new code using FT cache - it is hard to tell them apart. Plain text is very much faster, stroked text is about twice as fast. I think this could be improved by caching the stroked glyphs- which would probably not be too hard to do, but I'm not intending to do this in a PoC right now.

Hope that's interesting - if you decide you'd like to go further on using the FT cache - please give me a shout.

Oct 08 '22 20:10 time4tea

I think the difference between your and the Pillow output might be due to Raqm vs basic layout. You might want to take a look at #6631 / #6633.

Oct 09 '22 14:10 nulano

#6649 has now been fixed in main.

Oct 10 '22 04:10 radarhere

@nulano - yes - good observation. i'll try to make another performance test showing pillow with raqm, pillow without raqm and my bodge code (no raqm, so broadly similar to basic layout)) - definitely one issue with the cache approach is that it completely changes how you get glyphs, so it requires considerable changes - so making cached with raqm might not be easily achievable. i didn't look that hard at the raqm code though tbh. again - just to give some context - why is this important to me? i'm rendering many thousands of frames each with varying text. making the text function 12x (or 2x) quicker would make a big difference to me. I already cache images where the text doesn't vary, so looking at the text rendering makes sense. perhaps though, i jumped in at the deep end looking at the cache, when i could have tried disabling raqm! :-)

Oct 12 '22 10:10 time4tea

After Pillow 10 the deprecated fill parameter will be replaced by a direct call to the internal function.

This has now been done in #7059

Apr 10 '23 09:04 radarhere

the only Python function to be called between the two C functions is the decompression bomb check. If this was moved into C, the two functions could be combined to remove the duplicate call to text_layout.

I attempted this change, but found a problem - the _imagingft extension is not connected to the C code for creating new images. I couldn't call ImagingNewDirty and ImagingFill.

I worked around this by passing Image.core.fill to font_render - so the deprecation of the fill parameter may not have been blocking this after all.

I've created PR #7206 for the change. From my tests, it makes getmask2 10% faster.

Jun 10 '23 07:06 radarhere

Performance of ImageDraw::text() and potential use of FTC_Manager()

What did you do?