lvgl icon indicating copy to clipboard operation
lvgl copied to clipboard

~37% performance hit on ESP32S3 after migrating from v8.3.9 to v9.0.0

Open KamranAghlami opened this issue 2 years ago • 14 comments

LVGL version

v9.0.0

What happened?

Hi guys,

First of all congratulations on the new release, you've done fantastic work.

I wanted to share an unexpected result that I got migrating from v8.3.9 to v9.0.0. This is my T-Display-S3 example code available on my GitHub page.

The exact source code reproducing these images is available in lvgl_v8.3.9_benchmark and lvgl_v9.0.0_benchmark branches.

Thought I'd ask if I'm missing anything obvious here or if it is expected behavior.

v8 v9

How to reproduce?

No response

KamranAghlami avatar Jan 24 '24 10:01 KamranAghlami

Hi,

Thank you for the report! It's definitely not the intended behavior. v9 should be faster :slightly_smiling_face: A few notes

  • The way the is FPS is measured is different in V8 and V9. I suggest measuring the first execution time of lv_timer_handler
  • A lot of things have been changed with image handling and cachning. Do you see the same performance issue with simple rectangles too?
  • In v8 you set image change to 8 (cache 8 images) but in v9 these are not set
    • Max memory for image caching: here
    • Number of cached image header: here
  • Can you test without lv_draw_sw_rgb565_swap?
  • Have you enabled compiler optimization in both cases?

kisvegabor avatar Jan 24 '24 11:01 kisvegabor

I suggest measuring the first execution time of lv_timer_handler

I'm not sure what you mean by first execution time of lv_timer_handler, but I tried measuring performance in means of pixel throughput over time. (accumulated a count of updated pixels for 50 seconds in display->flush_cb after the scene stabilized), results are in margin of error. (v8: 1310464 pixels per second, v9: 813619 pixels per second, loss: 37.91%)

Do you see the same performance issue with simple rectangles too?

I did change images to simple rectangles, performance gap lessens here but still significant. (v8: 24fps, v9: 16fps, loss: 33.33%)

In v8 you set image change to 8 (cache 8 images) but in v9 these are not set

Here are my settings regarding image caching:

// 8 * 32x32 images * 4 , anything below this generates error logs at runtime.
#define LV_CACHE_DEF_SIZE       (32U * 1024U)
// I'm loading exactly 8 PNGs, anything below this causes severe drop in performance.
#define LV_IMAGE_HEADER_CACHE_DEF_CNT 8 

Can you test without lv_draw_sw_rgb565_swap?

I did and interestingly, performance difference was neglectable.

Have you enabled compiler optimization in both cases?

PlatformIO reports it is building in release mode and I swapped -Os for -O3, performance gap increased! (v8: 28fps, v9: 17fps, loss: 39.28%)

Another thing is, and this probably needs to be another issue, I tried using LV_OS_FREERTOS to test the new multi-core rendering feature, but compilation fails because taskENTER_CRITICAL and taskEXIT_CRITICAL macros require a missing otherwise spinlock_t argument in esp platform.

KamranAghlami avatar Jan 24 '24 15:01 KamranAghlami

In v9, the #define LV_CACHE_DEF_SIZE (32U * 1024U) means the memory size cache can use. 32kB is too small for the decoded PNG image.

Also for v9, the image is decoded to ARGB8888, which takes more space and more time to render. In v8, it's converted to native color before adding to cache.

You can try to set the cache size to larger than PNGwidth * PNGheight * 4 * numberOfImages.

If the lvgl native bin image could be used the performance could be better.

XuNeo avatar Jan 24 '24 16:01 XuNeo

32kB is too small for the decoded PNG image.

32K isn't much but my images are very small (32x32), anyways, today I tried 4MB for cache size and it made no difference.

Also for v9, the image is decoded to ARGB8888, which takes more space and more time to render. In v8, it's converted to native color before adding to the cache.

I did try using randomly colored squares instead of images, but there's still a 33.33% loss of fps.

If the lvgl native bin image could be used the performance could be better.

I don't have a performance target to hit per se, this is merely a stress test, so I'd rather compare case for case.

KamranAghlami avatar Jan 25 '24 08:01 KamranAghlami

Do you see the performance difference by eye too?

I'm not sure what you mean by first execution time of lv_timer_handler

I meant

uint32_t t = lv_tick_get();
lv_timer_handler();
printf("%d\n", lv_tick_elaps(t));

Please try running the new lv_demo_benchmark of LVGL v9. The same performance measurement and benchmark is back ported to v8 here: https://github.com/lvgl/lvgl/tree/v8.3-with-v9-benchmark

It should clearly show in which scenarios v9 is slower than v8 and we will have a unified FPS measurement as well.

kisvegabor avatar Jan 25 '24 09:01 kisvegabor

v8

1cdf4c88ade4b81c0c037b4d7554261 529d755a22376012d862adc7ffc5fcd 50c3346374084b788788f78f6346e6b

v9

10d716f12d0f3d0b9c3df09db8c3bae 7dcc0399a4672d739ffde36388673cb 54717af6ffb2f45b89dac7ec51309be

IAMMX avatar Jan 26 '24 03:01 IAMMX

Thank you so much!

I've collected some cases:

V8

  • All: 76%, 58FPS, 27 ms (23 + 4)
  • Empty screen: 56%, 81FPS, 8 ms (4 + 4)
  • Moving wallpaper: 91%, 45FPS, 21 ms (14 + 7)
  • Multiple labels: 84%, 100FPS, 8 ms (7 + 1)

V9

  • All: 81%, 46FPS, 27 ms (34 + 4)
  • Empty screen: 67%, 62FPS, 12 ms (5 + 7)
  • Moving wallpaper: 90%, 42FPS, 20 ms (12 + 8)
  • Multiple labels: 85%, 74FPS, 10 ms (10 + 0)

Notes:

  • The render time of an empty screen is quite the same (4 vs 5 ms), but the flush time is 4 vs 7 ms
  • In v9 the flush time (should be independent from LVGL versions) is 7-8 ms on full screen tests, but it varies between 4-7 on v8.
  • Label drawing is 50% slower on v9. I measured the opposite on STM32

This my measurement data for on STM32F769

V8

  • All: 63%, 24FPS, 44 ms (44 +0)
  • Empty screen: 36%, 30FPS, 12 ms (12)
  • Moving wallpaper: 45%, 30FPS, 14 ms (14 )
  • Multiple labels: 51%, 30FPS, 11 ms (11)

V9

  • All: 60%, 24FPS, 30 ms (30 + 0)
  • Empty screen: 37%, 28FPS, 10 ms (10 + 0)
  • Moving wallpaper: 34%, 29FPS, 12 ms (12 + 0)
  • Multiple labels: 48%, 29FPS, 14 ms (14 + 0)

So 44 ms render time in v8 vs 33ms in v9.

kisvegabor avatar Jan 26 '24 10:01 kisvegabor

@kisvegabor Try defining "#define LV_DEF_REFR_PERIOD 10". The default value of 33 is too large, resulting in a small average value, which is difficult to compare. @KamranAghlami Can you also send the benchmark results for v8 and v9?

IAMMX avatar Jan 26 '24 10:01 IAMMX

Interesting results...

I'm sorry as I am a little bit busy at the moment, I'll do these test and share my results in the upcoming weekend.

KamranAghlami avatar Jan 26 '24 14:01 KamranAghlami

I ran lv_demo_benchmark on both v8 and v9. One thing to consider is my screen is 320x170 so I had to modify the v8 benchmark to print CSV data like the v9 version, and also some examples might have elements that got culled as a result of being out of the viewport.

For both cases, I set LV_DEF_REFR_PERIOD to 33, 16, and 10 resulting in 30.3, 62.5, and 100 fps targets.

V8 30.3 FPS: v8_30.3.csv

v8

V9 30.3 FPS: v9_30.3.csv

v9

V8 62.5 FPS: v8_benchmark.csv

v8

V9 62.5 FPS: v9_benchmark.csv

v9

V8 100 FPS: v8_benchmark.csv

v8

V9 100 FPS: v9_benchmark.csv

v9

KamranAghlami avatar Jan 27 '24 15:01 KamranAghlami

Do you see the performance difference by eye too?

Yes, in the extreme case of 200 balls in my scene difference is 8 fps in v8 vs 4 fps in v9.

I meant

uint32_t t = lv_tick_get();
lv_timer_handler();
printf("%d\n", lv_tick_elaps(t));

v8: 18ms v9: 21ms

KamranAghlami avatar Jan 27 '24 15:01 KamranAghlami

I few things to try/test just cam to my mind:

  • Enable the style cache in v9: LV_OBJ_STYLE_CACHE
  • Try with 20 balls. Do you see the same difference in performance?
  • I found that BiDi (LV_USE_BIDI) is slower in v9. It shouldn't affects images, but if LV_USE_BIDI is enabled and labels are tested it matters.
  • In the benchmark the card has ARGB8888 image, RGB565A8 could be faster, I'll check.

kisvegabor avatar Jan 29 '24 08:01 kisvegabor

We need some feedback on this issue.

Now we mark this as "stale" because there was no activity here for 14 days.

Remove the "stale" label or comment else this will be closed in 7 days.

lvgl-bot avatar Feb 13 '24 01:02 lvgl-bot

It seems that in v9 drawing an object has more overhead than in v8. This is especially apparent when drawing a large number of widgets. We are investigating this issue.

zjanosy avatar Feb 20 '24 22:02 zjanosy

We need some feedback on this issue.

Now we mark this as "stale" because there was no activity here for 14 days.

Remove the "stale" label or comment else this will be closed in 7 days.

lvgl-bot avatar Mar 07 '24 01:03 lvgl-bot

As there was no activity here for a while we close this issue. But don't worry, the conversation is still here and you can get back to it at any time.

So feel free to comment if you have remarks or ideas on this topic.

lvgl-bot avatar Mar 14 '24 01:03 lvgl-bot

@zjanosy Did you have any luck with the investigation?

JeremiahGillis avatar Jul 26 '24 15:07 JeremiahGillis

We are investigating it with @espzav. It's quite likely that the flash cache gets overloaded because the rendering pipeline is more complicated in LVGL v9. Can you conform that the performance drop happens mainly when images stored in the flash are being rendered? (They overload the cache even more)

kisvegabor avatar Jul 29 '24 07:07 kisvegabor