Performance optimization, how to reduce the time spent on "collectGarbage" in the Vulkan backend?
Discussed in https://github.com/google/filament/discussions/7042
Originally posted by zy4944796 August 4, 2023 Good day everyone, I am trying to optimize the Vulkan backend to make its performance similar to that of the OpenGL backend. I have added systrace to the engine code and compared the systrace files of the OpenGL backend and Vulkan backend. I found that the main reason why the Vulkan backend is slower than the OpenGL backend is because the Vulkan backend needs to call collectGarbage, which takes 1-2ms. Is there any way to optimize this?
This is a comparison of the execution time of FEngine::execute for testing. Please see the yellow and red lines, which represent the time taken for the Vulkan backend to draw 500 cubes and the OpenGL backend to draw 1000 cubes, respectively. The times are similar.
Through analysis of the Systrace file, it was found that the collectGarbage function called in the commit of the Vulkan backend takes up to 1-2ms of time, with VulkanDisposer::gc accounting for a significant portion of the time, as shown in the following figure:
Why does VulkanDisposer::gc take so much time? Are there any ways to reduce the time taken for this part? For example, by reusing its resources or moving it to another thread?
@poweifeng the code itself seems pretty inefficient:
removeReferencedoes a hash-map lookup while we're iterating the hash-map! That's bad.- there are 2 iterations of the hash-map, maybe we could have only one?
- a new hash-map is recreated each time gc is called, by insertion of each element! So even when gc has nothing to do, it will:
- iterate the hashmap twice
- recreate the hashmap by copy-inserting each element to a new one
- destroy the old hashmap
- and potentially doing lookups during the 1st iteration
So there are actually at least two issues:
-
The latency caused by doing gc on the command buffers
- This was done to address a crash where the images were being gc'd before the command buffers that contain references to them are even processed. This issue was filed internally at google.
- I tried to address it in PR #6855. But it's definitely too aggressive as we note that the fence waiting is on the corresponding command buffers that need to be gc'd.
gc()is then 22ms on a pixel4xl. (I didn't know that it'd be waiting on every frame.) See Image A - Maybe we don't want to actually gc the command buffers that often, but instead make sure that each resource depending on the submission of a command buffer doesn't get freed/gc'd before the buffer is submitted. Image B shows that even if we don't wait on the fence, free-ing the command buffer is still expensive.
-
VulkanDisposer issues raised by @pixelflinger and @zy4944796
- Inefficient iteration of the hashmap
- Inefficient deletion of items from the hashmap
- Do we even need to use a hashmap here?
- There is about 350 items in the disposer at any frame for the DamagedHelmut.
- In Image B
disposer::gctakes 1ms. This should be able to go down significantly.
Image A
Image B
Without waiting on the fence of the command buffer.
ah... but I think for (1) we shouldn't wait for the fence, we should just check if it signaled and proceed only if it did. If it didn't, we keep the entry in the list. Alternatively, you could have a gc thread that wait on fences (but can you wait on several fences?) that does the clean-up as needed. But I'm nut sure it'll buy us that much.
As of #7735, all the ref-counting mechanisms have been implemented and should now be relatively stable.