obs-studio icon indicating copy to clipboard operation
obs-studio copied to clipboard

[WIP] Jim-nvenc on Linux

Open openglfreak opened this issue 4 years ago • 22 comments

Description

Adds an OpenGL version of jim-nvenc to obs-ffmpeg.

Motivation and Context

Why is this change required?

The fallback NVENC encoder that is used on Linux copies rendered frames back to the system RAM, copies them again into an FFMPEG AVFrame, and makes FFMPEG upload them back to the GPU. This takes a lot of CPU time especially on older or lower-end CPUs, like my Intel i3-4370. Encoding the textures directly without copying them to system RAM solves this issue.

How Has This Been Tested?

[WIP] I'm testing on Arch Linux using my Intel i3-4370 + Nvidia GTX 1060 6GB. CPU usage is decreased from around 25-30% to around 11%. The resulting video file looks correct.

Types of changes

  • Performance enhancement (non-breaking change which improves efficiency)

Checklist:

  • [x] My code has been run through clang-format.
  • [x] I have read the contributing document.
  • [x] My code is not on the master branch.
  • [ ] The code has been tested.
  • [ ] All commit messages are properly formatted and commits squashed where appropriate.
  • [ ] I have included updates to all appropriate documentation.

openglfreak avatar Jul 04 '21 18:07 openglfreak

How's this going? Many Linux OBS users have been anticipating this kind of addition

tt2468 avatar Oct 13 '21 07:10 tt2468

Hi there, any updates on this PR? Thanks!

WizardCM avatar Nov 28 '21 05:11 WizardCM

Hi there, any updates on this PR? Thanks!

Hi, I didn't have interest in working on this more because solving this properly is much more involved than I anticipated. I have a branch in my fork with more WIP work that anyone else is free to pick up.

I have a bit of free time over christmas where I might work on this but no promises.

openglfreak avatar Nov 28 '21 15:11 openglfreak

Wait I actually didn't test if it compiles lmao

Update: Works On My Machine

openglfreak avatar Dec 23 '21 05:12 openglfreak

I am especially interested in Windows testers, because this PR also does some things to the Windows side. For example, it should have ARGB support now, but that is completely untested and might just be completely broken for all I know, so that would need to be tested and either fixed or disabled again before this PR can be considered for merging. Also testing Intel QSV and other encoders might be a good idea to make sure I didn't break any of them.

openglfreak avatar Dec 23 '21 06:12 openglfreak

second obs-qsv11 commit adapted from #4931

openglfreak avatar Dec 24 '21 10:12 openglfreak

I applied this PR to OBS 27.2.0-beta1 on Debian 11 making something wrong is possible because I'm a user: With NV12 I got 15% of cpu usage With RGBA I got 45% of cpu usage.

Edit: I forgot to choose the "New NVENC" for jim-nvenc, sorry my fault. With NV12, I still get 15% of cpu usage and 50% of NVENC usage On RGBA, I get 10% of CPU usage and 40% of NVENC usage. Thats using only 2% more CPU and 20% more NVENC usage than no encoding amazing.

ogmkp avatar Dec 30 '21 15:12 ogmkp

I did a little A/B testing this morning. Debian 11, Nvidia 2060 495.46 & xfce4. 1920X Threadripper.

Using h.264 (new) preset lossless @ 1080p NV12.

The video recorded fine but CPU usage remained around 2% according to OBS. Same as regular nvenc.

That said, it doesn't appear to break anything.

VennStone avatar Dec 31 '21 19:12 VennStone

Hmm, on an RTX 3080 this triggers a fallback to FFmpeg NVENC, as it seems NV12 texture support is not available. I am unsure why, I'll do further testing. I am using the proprietary Nvidia drivers on Ubuntu 21.04.

WizardCM avatar Jan 02 '22 00:01 WizardCM

Last commits tested in 27.2.0-beta1 on Debian11 tonight. Thats better than before, OBS quits normally, haven't found crash at this time but I keep going on. Still using 10% of cpu with 4 HD videos loops with very low PCIe usage.

Can we use this PR to stream with RGBA to major platforms (I was thinking no) ?

Log still saying GPU conversion not available for format: 6 and NV12 texture support not available, it is normal ?

20:42:52.923: Initializing OpenGL...
20:42:52.961: Loading up OpenGL on adapter NVIDIA Corporation GeForce GTX 970/PCIe/SSE2
20:42:52.962: OpenGL loaded successfully, version 3.3.0 NVIDIA 460.91.03, shading language 3.30 NVIDIA via Cg compiler
20:42:52.976: ---------------------------------
20:42:52.976: video settings reset:
20:42:52.976: 	base resolution:   1920x1080
20:42:52.976: 	output resolution: 1920x1080
20:42:52.976: 	downscale filter:  Bilinear
20:42:52.976: 	fps:               60/1
20:42:52.976: 	format:            RGBA
20:42:52.976: 	YUV mode:          None
20:42:52.976: GPU conversion not available for format: 6
20:42:52.976: NV12 texture support not available

ogmkp avatar Jan 03 '22 20:01 ogmkp

@LGCW @ogmkp NV12 support is not implemented, I'd need to copy two textures of different formats together and I don't know how to do that. But RGB format should work.

Can we use this PR to stream with RGBA to major platforms (I was thinking no) ?

I don't know, I have no idea how the encoding itself works. This code just passes RGB textures into NVENC instead of NV12 textures.

openglfreak avatar Jan 06 '22 17:01 openglfreak

Can we use this PR to stream with RGBA to major platforms (I was thinking no) ?

I don't know, I have no idea how the encoding itself works. This code just passes RGB textures into NVENC instead of NV12 textures.

No. For streaming outputs, OBS forces the color format to NV12 if the encoder supports NV12 and OBS' color format setting is neither NV12 nor I420. https://github.com/obsproject/obs-studio/blob/f295bd99685b05ee74bf28b75590f088a77a4771/UI/window-basic-main-outputs.cpp#L536-L538 https://github.com/obsproject/obs-studio/blob/f295bd99685b05ee74bf28b75590f088a77a4771/UI/window-basic-main-outputs.cpp#L1335-L1337

From the documentation for obs_encoder_set_preferred_video_format:

Sets the preferred video format for a video encoder. If the encoder can use the format specified, it will force a conversion to that format if the obs output format does not match the preferred format.


GPU conversion not available for format: 6

This just means that the GPU cannot, or does not need to, convert the color format to RGB/RGBA/VIDEO_FORMAT_RGBA/AV_PIX_FMT_RGBA. OBS can only do GPU conversion for I420, I444, and NV12.

GPU conversion not available for format: 6 NV12 texture support not available

At present, these two messages will always be displayed together.

RytoEX avatar Jan 06 '22 23:01 RytoEX

Can we use this PR to stream with RGBA to major platforms (I was thinking no) ?

I don't know, I have no idea how the encoding itself works. This code just passes RGB textures into NVENC instead of NV12 textures.

No. OBS forces the color format to NV12 if the encoder supports NV12 and the color format setting is neither NV12 nor I420.

https://github.com/obsproject/obs-studio/blob/f295bd99685b05ee74bf28b75590f088a77a4771/UI/window-basic-main-outputs.cpp#L536-L538

Thank you for this clarification. I have tested RGBA streaming + recording at the same time with this PR and I have not yet detected a problem (because of keeping NV12 only for streaming). The performance improvement for CPU on Linux is great.

ogmkp avatar Jan 06 '22 23:01 ogmkp

So in a little bit of testing this patch on a Dual Quadro 5k (with OBS and the encoding running on the same GPU) running on an AMD TR1950X (CentOS 7 - latest updates and patches, compiled using Devtoolset 10). My system performance numbers are posted below.

UHD@59p - This Patch/RGB (27.0.1-896-g6e59ed3) obs_graphics_thread(16.6833 ms): min=0.315 ms, median=1.269 ms, max=272.318 ms, 99th percentile=4.428 ms, 99.7815% below 16.683 ms ┣tick_sources: min=0.001 ms, median=0.018 ms, max=0.396 ms, 99th percentile=0.035 ms ┣output_frame: min=0.101 ms, median=0.787 ms, max=37.66 ms, 99th percentile=3.431 ms ┃ ┗gs_context(video->graphics): min=0.101 ms, median=0.787 ms, max=37.659 ms, 99th percentile=3.431 ms ┃ ┣render_video: min=0.031 ms, median=0.681 ms, max=8.512 ms, 99th percentile=1.159 ms ┃ ┃ ┣render_main_texture: min=0.023 ms, median=0.624 ms, max=8.497 ms, 99th percentile=0.984 ms ┃ ┃ ┣render_output_texture: min=0.03 ms, median=0.052 ms, max=2.83 ms, 99th percentile=0.081 ms, 0.531404 calls per parent call ┃ ┃ ┗output_gpu_encoders: min=0 ms, median=0.005 ms, max=0.033 ms, 99th percentile=0.008 ms, 0.531404 calls per parent call ┃ ┗gs_flush: min=0.005 ms, median=0.009 ms, max=3.329 ms, 99th percentile=0.031 ms ┗render_displays: min=0.055 ms, median=0.313 ms, max=13.379 ms, 99th percentile=1.42 ms

UHD@59p - Main Repo/Master Branch/NV12 (27.2.0-35-gc639255) video_thread(video): min=0 ms, median=59.341 ms, max=2275.56 ms, 99th percentile=2061.66 ms receive_video: min=3.713 ms, median=16.112 ms, max=53.968 ms, 99th percentile=37.294 ms, 10.5879 calls per parent call do_encode: min=3.712 ms, median=16.111 ms, max=53.967 ms, 99th percentile=37.293 ms encode(streaming_h264): min=3.73 ms, median=16.108 ms, max=42.519 ms, 99th percentile=37.507 ms, 0.492923 calls per parent call

HD@59p - This Patch/RGB (27.0.1-896-g6e59ed3) obs_graphics_thread(16.6833 ms): min=0.426 ms, median=3.105 ms, max=220.502 ms, 99th percentile=7.637 ms, 99.886% below 16.683 ms ┣tick_sources: min=0.002 ms, median=0.014 ms, max=33.145 ms, 99th percentile=0.026 ms ┣output_frame: min=0.159 ms, median=2.493 ms, max=55.844 ms, 99th percentile=5.165 ms ┃ ┗gs_context(video->graphics): min=0.158 ms, median=2.492 ms, max=55.842 ms, 99th percentile=5.165 ms ┃ ┣render_video: min=0.056 ms, median=2.345 ms, max=14.45 ms, 99th percentile=4.51 ms ┃ ┃ ┣render_main_texture: min=0.045 ms, median=2.308 ms, max=14.436 ms, 99th percentile=4.408 ms ┃ ┃ ┣render_output_texture: min=0.034 ms, median=0.056 ms, max=1.102 ms, 99th percentile=0.092 ms, 0.24772 calls per parent call ┃ ┃ ┗output_gpu_encoders: min=0 ms, median=0.007 ms, max=0.05 ms, 99th percentile=0.013 ms, 0.24772 calls per parent call ┃ ┗gs_flush: min=0.005 ms, median=0.011 ms, max=7.974 ms, 99th percentile=0.068 ms ┗render_displays: min=0.066 ms, median=0.397 ms, max=8.115 ms, 99th percentile=4.461 ms

HD@59p - Main Repo/Master Branch/NV12 (27.2.0-35-gc649255) video_thread(video): min=0.904 ms, median=1.468 ms, max=3.485 ms, 99th percentile=2.718 ms ┗receive_video: min=0.902 ms, median=1.467 ms, max=3.482 ms, 99th percentile=2.715 ms ┗do_encode: min=0.902 ms, median=1.465 ms, max=3.462 ms, 99th percentile=2.713 ms ┣encode(streaming_h264): min=0.88 ms, median=1.442 ms, max=3.433 ms, 99th percentile=2.688 ms

I can now encode 4k streams on Linux+NVENC, whereas with the main/master I get 83%+ overload. This is a MASSIVE improvement and puts it on a similar level to Windows performance.

Update: I'm not sure if this is now broken, or if it ever worked, but I can't encode on my 2nd GPU using this method. Any time I select GPU #1 in the settings, it still encodes on GPU #0 (the same one OBS is running on).

c3r1c3 avatar Feb 26 '22 00:02 c3r1c3

@c3r1c3

Update: I'm not sure if this is now broken, or if it ever worked, but I can't encode on my 2nd GPU using this method. Any time I select GPU #1 in the settings, it still encodes on GPU #0 (the same one OBS is running on).

Hello, jim-nvenc should fall back to ffmpeg for encoding on other GPUs. Can you check if your log contains [jim-nvenc] different GPU selected by user, falling back to ffmpeg, and whether encoding on GPU #1 really works on master?

openglfreak avatar Feb 26 '22 01:02 openglfreak

Hello, jim-nvenc should fall back to ffmpeg for encoding on other GPUs. Can you check if your log contains [jim-nvenc] different GPU selected by user, falling back to ffmpeg, and whether encoding on GPU #1 really works on master?

Master/Linux/FFMPEG NVENC: When I select a different GPU, the same GPU that OBS runs on is used for the FFMPEG-NVENC encoder, so this is a bug in OBS/FFMPEG. jim-nvenc/Linux/new NVENC: Checking out the very latest changes, I now get the message about changing GPUs, and the log notes the different encoder being used...but it still runs on the same GPU as OBS (most likely due to the above noted bug). Master/Windows: I don't have a Windows computer with dual GPUs, so I can't test this scenario.

c3r1c3 avatar Feb 27 '22 07:02 c3r1c3

You wouldnt happen to have a list of dependencies would ya?

trying to compile atm on arch and its failing on the audio encoder with unknown type name 'AVCodecContext'; did you mean 'AVIODirContext'?

I know that's defined in libavcodec, but it doesn't seem to be pulling it from my system ffmpeg install.

Its also isolated to specifically this, as I can compile obs fine normally

mrteathyme avatar Feb 27 '22 08:02 mrteathyme

@openglfreak A request. Could you update your master to match the latest in OBS' master and then do a rebase of your branch from that? I found a couple of other issues that I'm not sure if they're related to this branch, or the version based on this branch. (Yes, I know I can do that locally, but it makes it a lot easier when testing, switching back-and-forth and keeping everything in sync to have it this way.)

Thanks again for all your hard work!

c3r1c3 avatar Feb 27 '22 19:02 c3r1c3

@openglfreak A request. Could you update your master to match the latest in OBS' master and then do a rebase of your branch from that? I found a couple of other issues that I'm not sure if they're related to this branch, or the version based on this branch. (Yes, I know I can do that locally, but it makes it a lot easier when testing, switching back-and-forth and keeping everything in sync to have it this way.)

Thanks again for all your hard work!

@c3r1c3 it was already on yesterday's master. I rebased it again to current master but there aren't many changes between then and now.

openglfreak avatar Feb 28 '22 01:02 openglfreak

Sorry about that. I was misunderstanding what github was telling me when looking at the branch info.

Back to the multi-GPU issue, it used to work in (jim-master-branch) OBS, but I don't know if it is an FFMPEG or OBS update that broke it, so I need to resolve that before being able to put the NVENC testing to bed.

As to what I've been looking into, I came across a few issues (render lag, encoding lag when the encoder isn't running), hence why I've been a bit quiet. On the surface almost all of them appear to be related to having OBS run in RGB mode, so they don't really have anything to do with this patch per se, but I'm still running tests and working my way through it. Hopefully I'll have some proper and comprehensive results in a couple more weeks.

c3r1c3 avatar Mar 07 '22 08:03 c3r1c3

Sorry, been real busy, but I did have some time to narrow down some of my issues. Short version: Any issues I've been looking into are a result of my system, Linux, RGB mode, and/or NDI being enabled. This patch is the real deal, provides a massive to noticeable performance improvement and should be applied to HEAD, with consideration for the testing needed for Windows/other encoders, as noted by openglfreak in post https://github.com/obsproject/obs-studio/pull/4974#issuecomment-1000071507

Thanks for making this @openglfreak

c3r1c3 avatar Mar 19 '22 20:03 c3r1c3

I rebased the commits for encode_texture2 to play with another encoder. And at least for vaapi its pretty much what we want to pass textures around as well. For linux we might have also considered exporting the dmabuf and passing that to the encoder but in my tests exporting can be quite slow. And on platforms like intel we need a graphics context to blit so we would end up just re-importing in the encoder anyway, its nicer to send the textures and let us add an gs_* to export on the encoder side if needed.

One small change id recommend struct encoder_texture { probably just having a array of 4 textures as thats the most planes the kernel supports so we should never go beyond that. (maybe +1 to be a null just to be safe, but the planes must be known from the format anyway due to how we create nv12/p010 textures).

Thanks a bunch for this change... despite it not landing yet.

kkartaltepe avatar Feb 05 '23 08:02 kkartaltepe

Out of curiosity, is this still being worked on?

Sid127 avatar Apr 26 '23 11:04 Sid127

It is actually. I've finally started working on this just recently in fact

jp9000 avatar Apr 26 '23 22:04 jp9000

I applied the commits best I could on top of master. It compiles, but doesn't launch, errors out with ./obs: symbol lookup error: ../../obs-plugins/64bit/obs-ffmpeg.so: undefined symbol: load_nvenc_lib (built portable mode)

Commits have been applied on top of 34e3d641582dca3e86e199343905756a1cbc0b64

Patch can be found here, because github won't let me attach the patch file directly.

Sid127 avatar May 02 '23 16:05 Sid127

Made some more progress, fixed the linkage errors. Compiles, launches, non-ffmpeg nvenc is selectable, but obs segfaults when trying to start a recording. Updated patchset is here, and the backtrace from the segfault is here.

I'm struggling to move forward, my lack of in-depth knowledge about all this is finally showing up. Hoping the patchset is a good base for getting things working sooner than later

Sid127 avatar May 03 '23 11:05 Sid127

Hi @jp9000 / @Lain-B , hope you're doing fine :) Don't want to stress, just curious how it is going. In case you have no time for this anymore, can you publish your code so far? Thanks.

Bleuzen avatar Aug 19 '23 22:08 Bleuzen

yea, I was working on it, but ended up delaying it again :sweat_smile: ...I'm sorry about that. It requires merging the texture sharing PR so I had to go through those changes first. I just need to get through that PR and then I'll be able to apply it to these changes.

Lain-B avatar Aug 21 '23 11:08 Lain-B

I'd love to test this on linux and windows but can't compile openglfreak:linux-jim-nvenc. I can compile origin:master though, is there a more recent rebase I could try?

tari3x avatar Sep 03 '23 21:09 tari3x

@Lain-B could you mention which PR it is so we can track progress and help with testing wherever applicable?

Sid127 avatar Sep 27 '23 05:09 Sid127