[Frame loop] Remove waitDeviceIdle in frame loop
A simple performance test shows that with or without those wait, it makes no difference in the loop time. As I tested for the bloom sample, I looped 10 frame 100 times, and did the test 10 times to get an average time for looping 10 frames 100 times, it turns out with or without the wait, they all roughly takes 6.5s to finish the loop.
So this could mean a couple of things
- We are introducing an equivalent delay (perhaps waiting on a fence for example)
- We have a significant overhead already on performance
- The devicewaitidle doesn't actually matter at all.
But I would suggest that this is probably vsync bound, so you would not notice.
I think we already have overhead on performance, as the I saw the gapir has lots of repeated load operation, which copies resource around. That may mean the gapir didn't make the GPU busy enough, so that when we called the waitdevice idle, it actually already idled. nvidia-smi shows that during the replay, the GPU utilization is very low.
But this is still much faster than call the relay 100 times for the 10 frame. With the frame loop it takes 6.5 ms (6.5s for 10 frames loop 100 times) for one frame, I don't have the time for just repeat the replay 100 times yet.