When tiles are streamed, GPU resources are uploaded to the GPU synchronously in the render thread. A quick change in Zoom or location can block the render thread due to very slow uploads: Many uploads are serialized delaying actual rendering and affecting the user experience. The following Tracy screenshot shows a slow frame. Most of the frame is processing uploads of vertex buffers to the GPU: slow_frame

Zoomed version of the same screenshot where we see actual rendering at the very end of the frame: slow_frame_zoom

Pipelining the renderer would alleviate the problem but it won't be enough when many uploads are processed in the same frame. Also, pipelining the renderer requires significant efforts. An easier solution that may scale better is to distribute uploads to as many threads as possible and only wait for them to finish when they are needed in a draw call. This can scale better when many free CPU cores exist.

This PR adds shared EGL contexts to the Android backend to issue buffer uploads in worker threads. The existing thread pool has been updated to enable persistent EGL contexts (I noticed very poor performance on Qualcomm drivers when migrating contexts between threads hence the thread pool with persistent contexts: no calls to eglMakeCurrent every frame and in different threads). This allows concurrent uploads to the GPU: slow_frame_with_async_upload

The uploads are synchronized with the draws: the render thread only waits for a buffer when a draw needs it. This allow drawing while uploading data: parallel_draw_and_upload The uploads are currently randomly scheduled. Parallelizing drawing and uploading can be improved by first scheduling uploads for resources that get used late in the frame, e.g. uploading resources used by translucent objects before opaque objects since opaques are rendered first.

This PR is specific to Android and is made such as other backends are unaffected and have minimal code change. Once Vulkan is the official backend on Android similar and better handling of resources can be done.

Texture uploads are also slow but less frequent and will be handled in a separate PR: upload_texture_future_work

There are unused index buffer that we currently wait on before we destroy them and this is slowing down the render thread #2760 unused_index_buffers We can destroy them in a future frame but we should understand why they are there in the first place.

There are slowdowns in the the render thread that will be investigated separately (mailbox related): slow_mail_box_push_lock

Aug 24 '24 01:08 alasram

Bloaty Results (iOS) 🐋

Compared to main

    FILE SIZE        VM SIZE    
 --------------  -------------- 
  +0.0% +3.17Ki  [ = ]       0    TOTAL

Full report: https://maplibre-native.s3.eu-central-1.amazonaws.com/bloaty-results-ios/pr-2761-compared-to-main.txt

Aug 24 '24 02:08 github-actions[bot]

Bloaty Results 🐋

Compared to main

    FILE SIZE        VM SIZE    
 --------------  -------------- 
  +0.2%  +337Ki  +0.3%  +107Ki    TOTAL

Full report: https://maplibre-native.s3.eu-central-1.amazonaws.com/bloaty-results/pr-2761-compared-to-main.txt

Compared to d38709084a9865fe0bb8300aec70ebf8243b3d43 (legacy)

    FILE SIZE        VM SIZE    
 --------------  -------------- 
   +28% +32.9Mi  +427% +25.5Mi    TOTAL

Full report: https://maplibre-native.s3.eu-central-1.amazonaws.com/bloaty-results/pr-2761-compared-to-legacy.txt

Aug 24 '24 03:08 github-actions[bot]

Benchmark Results ⚡

Benchmark                                                     Time             CPU      Time Old      Time New       CPU Old       CPU New
------------------------------------------------------------------------------------------------------------------------------------------
OVERALL_GEOMEAN                                            -0.0155         -0.0154             0             0             0             0

Full report: https://maplibre-native.s3.eu-central-1.amazonaws.com/benchmark-results/pr-2761-compared-to-main.txt

Aug 24 '24 03:08 github-actions[bot]

Just tagging @mwilsnd and @alexcristici to take a look at this.

Sep 17 '24 16:09 sjg-wdw

I will be making this opt-in because the performance is variable depending on vendors:

Emulator. Shared EGL contexts not well supported (known issue)
Mali drivers (Google pixel 6 and 7) on Android: uploads are like memcpy so no benefit of using shared contexts
Adreno drivers on Android: buffer updates are slow compared to a memcpy so this is useful
Adreno drivers on QNX: buffer upload perf is much lower than a regular Android driver and shared contexts scale the perf (screenshots are on QNX)

Sep 17 '24 17:09 alasram

I changed this to always on by default on emulator and any device if GLES 3 is supported otherwise a GLES 2 context is requested and multi threading is disabled.

Sep 25 '24 18:09 alasram

Free threaded resources on Android

Bloaty Results (iOS) 🐋

Full report: https://maplibre-native.s3.eu-central-1.amazonaws.com/bloaty-results-ios/pr-2761-compared-to-main.txt

Bloaty Results 🐋

Full report: https://maplibre-native.s3.eu-central-1.amazonaws.com/bloaty-results/pr-2761-compared-to-main.txt