trezor-firmware Improve image decompression speed and handling

comment thread from a different PR: https://github.com/trezor/trezor-firmware/pull/2339#discussion_r919906423

measurement results:

Some results from the testing:

the fastest is decompress as a whole, without window usage

slowest is to decompress byte by byte

This is where intuitive results end though. When decompressing by chunks, fastest seem to be using very small chunks of 2 or 4 bytes, after that, with increasing chunk size the speed actually goes down. I was testing this on two images, the best chunk size differ in each. I also tried comiling the uzlib with -O3 for one image, which sped up the decompressing quite nicely, but didn't change the dependence on the chunk size. Even when decompressing in one chunk when using window, its way slower than without window usage or small chunks.

Just to have some idea what increases we can get:

Decompressing by best chunk was 17% to 65% faster than per byte

Decompressing by 128 byte chunk was 7% to 40% faster than per byte.

Decompressing whole image without window was 70% to 110% faster than per byte.

This will differ a lot by image. Maybe testing on more images would bring better results.

So based on this, i would suggest using chunks of 4 bytes for decompression of large images (that means modify display_icon, display_image functions) and implementing new display_small_icon with some reasonable limits (loader using 64*64 so perhaps that) where we could do the whole decompression at once.

semantic issues: B&W models (T1, TR) use display_icon to render the homescreen. The homescreen is certainly not an icon, and the important feature of "icon" is that its format is 4bpp alpha and so can be blended into different colored foregrounds/backgrounds -- unlike a "homescreen" which is an image at the display's native color resolution. This is also incompatible with actual T1 homescreens which are stored as raw 1bpp data stream.

We probably need a separate function for rendering TR/T1 homescreens. When we have that, we can limit display_icon accepted size: largest icons currently used are 24x24, so 576 pixels, or 288 bytes. That allows us to use window-less decompression into a static buffer.

Loaders are using a larger 64x64 center glyph, which is 4096 pixels or 2048 bytes. This could still be acceptable for window-less decompression.

possibly worth folding in: the new popups are using relatively large full-color icons. These could be split into smaller shape+glyph combinations: https://github.com/trezor/trezor-firmware/pull/2287#issuecomment-1143575341

Jul 26 '22 12:07 matejcik

ad the 2287 issue: agree that splitting could be a way, though the shape is just circle and rounded triangle for which procedural implementation would make more sense. The glyph could then be greyscale, though it wouldn't fit the discussed 24*24 limit.

Maybe the display_image and display_icon functions are also named badly, wouldn't it be more fitting to call them display_16bpp_image and display_4bpp_image? The color is the real difference and neither does care about the size in current impl. Then it would actually make sense to also introduce display_1bpp_image function and small variants (either separate or hidden inside as discussed) with optimized decompression.

Jul 26 '22 13:07 TychoVrahe

display_16bpp_image and display_4bpp_image?

That is an option -- but the real difference IMO is that display_icon takes fg and bg color and uses a blending table instead of just dumping pixel values on screen. I don't think we ever want to display a 4bpp image as-is, but we do want to display the T1/TR homescreen as-is.

FWIW, display_icon does essentially the same thing as rendering a single glyph in display_text_render right?

the shape is just circle and rounded triangle for which procedural implementation would make more sense.

could be, but would the procedural implementation actually be smaller? :)

Jul 26 '22 14:07 matejcik

ever want to display a 4bpp image as-is

yeah that doesn't seem to be very useful. but this blending could be similar to 1bpp image usage on TT, in T1/TR homescreens the FG and BG color are just implicit, and theoretically you could want them flipped for some reason

display_icon does essentially the same thing as rendering a single glyph in display_text_render right?

essentially, but the text has more complexity due to the bearing and it also handles the different BPP values internally (in compile time)

could be, but would the procedural implementation actually be smaller? :)

who knows without trying, but procedural impl of basic geometric shapes can be easily reused in different context, sizes etc, which, ultimately, should lead to smaller flash footprint

Jul 26 '22 14:07 TychoVrahe

One more thing to consider when optimizing decompression in animations like loader with icon is to do the decompression only once instead of during every frame. The decompressed data would(should) be stored on heap/gc allocated if used from rust.

Jul 26 '22 14:07 TychoVrahe

what remains to be done here? could we close, or transfer knowledge to Notion and then close?

Jan 19 '23 13:01 matejcik

waiting for micropython update #2341 , after that i want to set optimization of the deflate function to -03.

Jan 19 '23 14:01 TychoVrahe

trezor-firmware trezor-firmware copied to clipboard

Improve image decompression speed and handling

trezor-firmware
trezor-firmware copied to clipboard