obs-studio
obs-studio copied to clipboard
obs-ffmpeg: Release encode texture early
Description
During high graphics thread pressure it can take a significant time to acquire the graphics lock. This change releases the OpenGL texture after rendering to avoid the 2nd lock after sending the frame to FFmpeg. This improves 99%-tile/100%-tile and median encode in a near encoder overload scenario, and modestly raises the ceiling before encoder overload in my test scene.
Motivation and Context
Random dropped frames are no fun.
Master: min=0 ms, median=4.29 ms, max=33.072 ms, 99th percentile=8.877 ms min=0 ms, median=4.438 ms, max=77.157 ms, 99th percentile=9.853 ms min=0 ms, median=4.527 ms, max=57.292 ms, 99th percentile=9.282 ms
This commit: min=0.97 ms, median=3.009 ms, max=13.215 ms, 99th percentile=5.899 ms min=1.181 ms, median=2.91 ms, max=9.854 ms, 99th percentile=5.56 ms min=0.461 ms, median=3.013 ms, max=10.693 ms, 99th percentile=5.871 ms
How Has This Been Tested?
Compared recordings before and after, and this seems semantically correct as its what the current software encoder does. Though a review from @nowrep or anyone who actually knows the semantics of ffmpeg would be appreciated.
Types of changes
- Performance enhancement (non-breaking change which improves efficiency)
Checklist:
- [x] My code has been run through clang-format.
- [x] I have read the contributing document.
- [x] My code is not on the master branch.
- [x] The code has been tested.
- [x] All commit messages are properly formatted and commits squashed where appropriate.
- [x] I have included updates to all appropriate documentation.
This breaks when the encoder isn't done with the previously submitted frame (eg. when it needs to change encode order with B-frames).
If the issue is about taking the graphics thread lock twice, then it could be changed to do both the texture import and copy in one lock. Another option would be to cache the textures for VASurface (ffmpeg reuses VA surfaces from pool).
If the issue is about taking the graphics thread lock twice, then it could be changed to do both the texture import and copy in one lock
~~These are in one lock~~ So fast its almost never really preempted, its releasing the texture once we have submitted the frame to ffmpeg that requires the lock again. We also dont want to hold the graphics lock while sending the frame to ffmpeg since that will stall rendering.
Another option would be to cache the textures for VASurface (ffmpeg reuses VA surfaces from pool).
This what I do for QSV, if you have an example on enumerating the ffmpeg frame pool I can do that too.
These are in one lock So fast its almost never really preempted, its releasing the texture once we have submitted the frame to ffmpeg that requires the lock again. We also dont want to hold the graphics lock while sending the frame to ffmpeg since that will stall rendering, the .
All this could be done in that one required lock that does the copy before encode.
This what I do for QSV, if you have an example on enumerating the ffmpeg frame pool I can do that too.
It still needs to create new AVFrame every frame, but it would only do vaExportSurfaceHandle + texture import once for every VASurfaceID (frame->data[3]).
All this could be done in that one required lock that does the copy before encode.
Sorry, now i see what you are saying. This actually this performs even better.
Looks good now, thanks.
Thanks for the review