obs-studio
obs-studio copied to clipboard
[WIP] Jim-nvenc on Linux
Description
Adds an OpenGL version of jim-nvenc to obs-ffmpeg.
Motivation and Context
Why is this change required?
The fallback NVENC encoder that is used on Linux copies rendered frames back to the system RAM, copies them again into an FFMPEG AVFrame, and makes FFMPEG upload them back to the GPU. This takes a lot of CPU time especially on older or lower-end CPUs, like my Intel i3-4370. Encoding the textures directly without copying them to system RAM solves this issue.
How Has This Been Tested?
[WIP] I'm testing on Arch Linux using my Intel i3-4370 + Nvidia GTX 1060 6GB. CPU usage is decreased from around 25-30% to around 11%. The resulting video file looks correct.
Types of changes
- Performance enhancement (non-breaking change which improves efficiency)
Checklist:
- [x] My code has been run through clang-format.
- [x] I have read the contributing document.
- [x] My code is not on the master branch.
- [ ] The code has been tested.
- [ ] All commit messages are properly formatted and commits squashed where appropriate.
- [ ] I have included updates to all appropriate documentation.
How's this going? Many Linux OBS users have been anticipating this kind of addition
Hi there, any updates on this PR? Thanks!
Hi there, any updates on this PR? Thanks!
Hi, I didn't have interest in working on this more because solving this properly is much more involved than I anticipated. I have a branch in my fork with more WIP work that anyone else is free to pick up.
I have a bit of free time over christmas where I might work on this but no promises.
Wait I actually didn't test if it compiles lmao
Update: Works On My Machine
I am especially interested in Windows testers, because this PR also does some things to the Windows side. For example, it should have ARGB support now, but that is completely untested and might just be completely broken for all I know, so that would need to be tested and either fixed or disabled again before this PR can be considered for merging. Also testing Intel QSV and other encoders might be a good idea to make sure I didn't break any of them.
second obs-qsv11 commit adapted from #4931
I applied this PR to OBS 27.2.0-beta1 on Debian 11 making something wrong is possible because I'm a user: With NV12 I got 15% of cpu usage With RGBA I got 45% of cpu usage.
Edit: I forgot to choose the "New NVENC" for jim-nvenc, sorry my fault. With NV12, I still get 15% of cpu usage and 50% of NVENC usage On RGBA, I get 10% of CPU usage and 40% of NVENC usage. Thats using only 2% more CPU and 20% more NVENC usage than no encoding amazing.
I did a little A/B testing this morning. Debian 11, Nvidia 2060 495.46 & xfce4. 1920X Threadripper.
Using h.264 (new) preset lossless @ 1080p NV12.
The video recorded fine but CPU usage remained around 2% according to OBS. Same as regular nvenc.
That said, it doesn't appear to break anything.
Hmm, on an RTX 3080 this triggers a fallback to FFmpeg NVENC, as it seems NV12 texture support is not available. I am unsure why, I'll do further testing. I am using the proprietary Nvidia drivers on Ubuntu 21.04.
Last commits tested in 27.2.0-beta1 on Debian11 tonight. Thats better than before, OBS quits normally, haven't found crash at this time but I keep going on. Still using 10% of cpu with 4 HD videos loops with very low PCIe usage.
Can we use this PR to stream with RGBA to major platforms (I was thinking no) ?
Log still saying GPU conversion not available for format: 6 and NV12 texture support not available, it is normal ?
20:42:52.923: Initializing OpenGL...
20:42:52.961: Loading up OpenGL on adapter NVIDIA Corporation GeForce GTX 970/PCIe/SSE2
20:42:52.962: OpenGL loaded successfully, version 3.3.0 NVIDIA 460.91.03, shading language 3.30 NVIDIA via Cg compiler
20:42:52.976: ---------------------------------
20:42:52.976: video settings reset:
20:42:52.976: base resolution: 1920x1080
20:42:52.976: output resolution: 1920x1080
20:42:52.976: downscale filter: Bilinear
20:42:52.976: fps: 60/1
20:42:52.976: format: RGBA
20:42:52.976: YUV mode: None
20:42:52.976: GPU conversion not available for format: 6
20:42:52.976: NV12 texture support not available
@LGCW @ogmkp NV12 support is not implemented, I'd need to copy two textures of different formats together and I don't know how to do that. But RGB format should work.
Can we use this PR to stream with RGBA to major platforms (I was thinking no) ?
I don't know, I have no idea how the encoding itself works. This code just passes RGB textures into NVENC instead of NV12 textures.
Can we use this PR to stream with RGBA to major platforms (I was thinking no) ?
I don't know, I have no idea how the encoding itself works. This code just passes RGB textures into NVENC instead of NV12 textures.
No. For streaming outputs, OBS forces the color format to NV12 if the encoder supports NV12 and OBS' color format setting is neither NV12 nor I420. https://github.com/obsproject/obs-studio/blob/f295bd99685b05ee74bf28b75590f088a77a4771/UI/window-basic-main-outputs.cpp#L536-L538 https://github.com/obsproject/obs-studio/blob/f295bd99685b05ee74bf28b75590f088a77a4771/UI/window-basic-main-outputs.cpp#L1335-L1337
From the documentation for obs_encoder_set_preferred_video_format:
Sets the preferred video format for a video encoder. If the encoder can use the format specified, it will force a conversion to that format if the obs output format does not match the preferred format.
GPU conversion not available for format: 6
This just means that the GPU cannot, or does not need to, convert the color format to RGB/RGBA/VIDEO_FORMAT_RGBA/AV_PIX_FMT_RGBA. OBS can only do GPU conversion for I420, I444, and NV12.
GPU conversion not available for format: 6 NV12 texture support not available
At present, these two messages will always be displayed together.
Can we use this PR to stream with RGBA to major platforms (I was thinking no) ?
I don't know, I have no idea how the encoding itself works. This code just passes RGB textures into NVENC instead of NV12 textures.
No. OBS forces the color format to NV12 if the encoder supports NV12 and the color format setting is neither NV12 nor I420.
https://github.com/obsproject/obs-studio/blob/f295bd99685b05ee74bf28b75590f088a77a4771/UI/window-basic-main-outputs.cpp#L536-L538
Thank you for this clarification. I have tested RGBA streaming + recording at the same time with this PR and I have not yet detected a problem (because of keeping NV12 only for streaming). The performance improvement for CPU on Linux is great.
So in a little bit of testing this patch on a Dual Quadro 5k (with OBS and the encoding running on the same GPU) running on an AMD TR1950X (CentOS 7 - latest updates and patches, compiled using Devtoolset 10). My system performance numbers are posted below.
UHD@59p - This Patch/RGB (27.0.1-896-g6e59ed3) obs_graphics_thread(16.6833 ms): min=0.315 ms, median=1.269 ms, max=272.318 ms, 99th percentile=4.428 ms, 99.7815% below 16.683 ms ┣tick_sources: min=0.001 ms, median=0.018 ms, max=0.396 ms, 99th percentile=0.035 ms ┣output_frame: min=0.101 ms, median=0.787 ms, max=37.66 ms, 99th percentile=3.431 ms ┃ ┗gs_context(video->graphics): min=0.101 ms, median=0.787 ms, max=37.659 ms, 99th percentile=3.431 ms ┃ ┣render_video: min=0.031 ms, median=0.681 ms, max=8.512 ms, 99th percentile=1.159 ms ┃ ┃ ┣render_main_texture: min=0.023 ms, median=0.624 ms, max=8.497 ms, 99th percentile=0.984 ms ┃ ┃ ┣render_output_texture: min=0.03 ms, median=0.052 ms, max=2.83 ms, 99th percentile=0.081 ms, 0.531404 calls per parent call ┃ ┃ ┗output_gpu_encoders: min=0 ms, median=0.005 ms, max=0.033 ms, 99th percentile=0.008 ms, 0.531404 calls per parent call ┃ ┗gs_flush: min=0.005 ms, median=0.009 ms, max=3.329 ms, 99th percentile=0.031 ms ┗render_displays: min=0.055 ms, median=0.313 ms, max=13.379 ms, 99th percentile=1.42 ms
UHD@59p - Main Repo/Master Branch/NV12 (27.2.0-35-gc639255) video_thread(video): min=0 ms, median=59.341 ms, max=2275.56 ms, 99th percentile=2061.66 ms receive_video: min=3.713 ms, median=16.112 ms, max=53.968 ms, 99th percentile=37.294 ms, 10.5879 calls per parent call do_encode: min=3.712 ms, median=16.111 ms, max=53.967 ms, 99th percentile=37.293 ms encode(streaming_h264): min=3.73 ms, median=16.108 ms, max=42.519 ms, 99th percentile=37.507 ms, 0.492923 calls per parent call
HD@59p - This Patch/RGB (27.0.1-896-g6e59ed3) obs_graphics_thread(16.6833 ms): min=0.426 ms, median=3.105 ms, max=220.502 ms, 99th percentile=7.637 ms, 99.886% below 16.683 ms ┣tick_sources: min=0.002 ms, median=0.014 ms, max=33.145 ms, 99th percentile=0.026 ms ┣output_frame: min=0.159 ms, median=2.493 ms, max=55.844 ms, 99th percentile=5.165 ms ┃ ┗gs_context(video->graphics): min=0.158 ms, median=2.492 ms, max=55.842 ms, 99th percentile=5.165 ms ┃ ┣render_video: min=0.056 ms, median=2.345 ms, max=14.45 ms, 99th percentile=4.51 ms ┃ ┃ ┣render_main_texture: min=0.045 ms, median=2.308 ms, max=14.436 ms, 99th percentile=4.408 ms ┃ ┃ ┣render_output_texture: min=0.034 ms, median=0.056 ms, max=1.102 ms, 99th percentile=0.092 ms, 0.24772 calls per parent call ┃ ┃ ┗output_gpu_encoders: min=0 ms, median=0.007 ms, max=0.05 ms, 99th percentile=0.013 ms, 0.24772 calls per parent call ┃ ┗gs_flush: min=0.005 ms, median=0.011 ms, max=7.974 ms, 99th percentile=0.068 ms ┗render_displays: min=0.066 ms, median=0.397 ms, max=8.115 ms, 99th percentile=4.461 ms
HD@59p - Main Repo/Master Branch/NV12 (27.2.0-35-gc649255) video_thread(video): min=0.904 ms, median=1.468 ms, max=3.485 ms, 99th percentile=2.718 ms ┗receive_video: min=0.902 ms, median=1.467 ms, max=3.482 ms, 99th percentile=2.715 ms ┗do_encode: min=0.902 ms, median=1.465 ms, max=3.462 ms, 99th percentile=2.713 ms ┣encode(streaming_h264): min=0.88 ms, median=1.442 ms, max=3.433 ms, 99th percentile=2.688 ms
I can now encode 4k streams on Linux+NVENC, whereas with the main/master I get 83%+ overload. This is a MASSIVE improvement and puts it on a similar level to Windows performance.
Update: I'm not sure if this is now broken, or if it ever worked, but I can't encode on my 2nd GPU using this method. Any time I select GPU #1 in the settings, it still encodes on GPU #0 (the same one OBS is running on).
@c3r1c3
Update: I'm not sure if this is now broken, or if it ever worked, but I can't encode on my 2nd GPU using this method. Any time I select GPU #1 in the settings, it still encodes on GPU #0 (the same one OBS is running on).
Hello, jim-nvenc should fall back to ffmpeg for encoding on other GPUs. Can you check if your log contains [jim-nvenc] different GPU selected by user, falling back to ffmpeg, and whether encoding on GPU #1 really works on master?
Hello, jim-nvenc should fall back to ffmpeg for encoding on other GPUs. Can you check if your log contains
[jim-nvenc] different GPU selected by user, falling back to ffmpeg, and whether encoding on GPU #1 really works on master?
Master/Linux/FFMPEG NVENC: When I select a different GPU, the same GPU that OBS runs on is used for the FFMPEG-NVENC encoder, so this is a bug in OBS/FFMPEG. jim-nvenc/Linux/new NVENC: Checking out the very latest changes, I now get the message about changing GPUs, and the log notes the different encoder being used...but it still runs on the same GPU as OBS (most likely due to the above noted bug). Master/Windows: I don't have a Windows computer with dual GPUs, so I can't test this scenario.
You wouldnt happen to have a list of dependencies would ya?
trying to compile atm on arch and its failing on the audio encoder with
unknown type name 'AVCodecContext'; did you mean 'AVIODirContext'?
I know that's defined in libavcodec, but it doesn't seem to be pulling it from my system ffmpeg install.
Its also isolated to specifically this, as I can compile obs fine normally
@openglfreak A request. Could you update your master to match the latest in OBS' master and then do a rebase of your branch from that? I found a couple of other issues that I'm not sure if they're related to this branch, or the version based on this branch. (Yes, I know I can do that locally, but it makes it a lot easier when testing, switching back-and-forth and keeping everything in sync to have it this way.)
Thanks again for all your hard work!
@openglfreak A request. Could you update your master to match the latest in OBS' master and then do a rebase of your branch from that? I found a couple of other issues that I'm not sure if they're related to this branch, or the version based on this branch. (Yes, I know I can do that locally, but it makes it a lot easier when testing, switching back-and-forth and keeping everything in sync to have it this way.)
Thanks again for all your hard work!
@c3r1c3 it was already on yesterday's master. I rebased it again to current master but there aren't many changes between then and now.
Sorry about that. I was misunderstanding what github was telling me when looking at the branch info.
Back to the multi-GPU issue, it used to work in (jim-master-branch) OBS, but I don't know if it is an FFMPEG or OBS update that broke it, so I need to resolve that before being able to put the NVENC testing to bed.
As to what I've been looking into, I came across a few issues (render lag, encoding lag when the encoder isn't running), hence why I've been a bit quiet. On the surface almost all of them appear to be related to having OBS run in RGB mode, so they don't really have anything to do with this patch per se, but I'm still running tests and working my way through it. Hopefully I'll have some proper and comprehensive results in a couple more weeks.
Sorry, been real busy, but I did have some time to narrow down some of my issues. Short version: Any issues I've been looking into are a result of my system, Linux, RGB mode, and/or NDI being enabled. This patch is the real deal, provides a massive to noticeable performance improvement and should be applied to HEAD, with consideration for the testing needed for Windows/other encoders, as noted by openglfreak in post https://github.com/obsproject/obs-studio/pull/4974#issuecomment-1000071507
Thanks for making this @openglfreak
I rebased the commits for encode_texture2 to play with another encoder. And at least for vaapi its pretty much what we want to pass textures around as well. For linux we might have also considered exporting the dmabuf and passing that to the encoder but in my tests exporting can be quite slow. And on platforms like intel we need a graphics context to blit so we would end up just re-importing in the encoder anyway, its nicer to send the textures and let us add an gs_* to export on the encoder side if needed.
One small change id recommend struct encoder_texture { probably just having a array of 4 textures as thats the most planes the kernel supports so we should never go beyond that. (maybe +1 to be a null just to be safe, but the planes must be known from the format anyway due to how we create nv12/p010 textures).
Thanks a bunch for this change... despite it not landing yet.
Out of curiosity, is this still being worked on?
It is actually. I've finally started working on this just recently in fact
I applied the commits best I could on top of master. It compiles, but doesn't launch, errors out with ./obs: symbol lookup error: ../../obs-plugins/64bit/obs-ffmpeg.so: undefined symbol: load_nvenc_lib (built portable mode)
Commits have been applied on top of 34e3d641582dca3e86e199343905756a1cbc0b64
Patch can be found here, because github won't let me attach the patch file directly.
Made some more progress, fixed the linkage errors. Compiles, launches, non-ffmpeg nvenc is selectable, but obs segfaults when trying to start a recording. Updated patchset is here, and the backtrace from the segfault is here.
I'm struggling to move forward, my lack of in-depth knowledge about all this is finally showing up. Hoping the patchset is a good base for getting things working sooner than later
Hi @jp9000 / @Lain-B , hope you're doing fine :) Don't want to stress, just curious how it is going. In case you have no time for this anymore, can you publish your code so far? Thanks.
yea, I was working on it, but ended up delaying it again :sweat_smile: ...I'm sorry about that. It requires merging the texture sharing PR so I had to go through those changes first. I just need to get through that PR and then I'll be able to apply it to these changes.
I'd love to test this on linux and windows but can't compile openglfreak:linux-jim-nvenc. I can compile origin:master though, is there a more recent rebase I could try?
@Lain-B could you mention which PR it is so we can track progress and help with testing wherever applicable?