NVEnc NVEncC (x64) 7.40 (r2648) does not handle both encode engines

Hi, thank you for the great work, very much appreciated! I have a GTX970 GPU card and according to NVIDIA it has two encode engines. but when I run an encode task with Staxrip that uses your {NVEncC (x64) 7.40 (r2648) by rigaya} I only see the GPU to be used only about 50%. According to NVIDIA, the driver should automatically handle the encode engines. https://docs.nvidia.com/video-technologies/video-codec-sdk/12.1/nvenc-application-note/index.html#nvenc-performance

1 encoder used only gtx970

Jan 17 '24 01:01 pm0code

Thus, if the GPU has 2 NVENCs (e.g. GP104, AD104), multiply the corresponding number in Table 3 by the number of NVENCs per chip to get aggregate maximum performance (applicable only when running multiple simultaneous encode sessions). Note that unless Split Frame Encoding is enabled, performance with single encoding session cannot exceed performance per NVENC, regardless of the number of NVENCs present on the GPU. Multi NVENC Split Frame Encoding is a feature introduced in SDK12.0 on Ada GPUs for HEVC and AV1.

Jan 17 '24 04:01 claudiomarassi

Well, that is all fine and dandy, but by my reference is Staxrip that uses the NVEncC (x64) 7.40 (r2648) driver in two instances on two different GPUs side by side encoding the same video file and observing the FPS. they are more or less the same meaning the second encode engine on GTX970 is doing nothing, obvious by task manager showing it at 50%. The question then becomes, "How can Split screen encoding " be enabled in Staxrip? OR Are you saying that if I allocate another encoding session to the GTX970, the driver will allocate it to the second encoder on that chip?

Thanks.

GTX960vs GTX970

Jan 17 '24 18:01 pm0code

Yes, if you encode two files at the same time it will allocate one on each nvenc chip, totaling 100% usage.

In Geforces Series 4000 it can use both nvenc chips in the same file, doubling the speed.

Jan 17 '24 19:01 claudiomarassi

so, to summarize, for cards prior to 4000 series, either split screen encoding has to be enabled OR two encodes on the same chip. for 4000 series, this happens automatically for one encode on the same chip. Correct?

Jan 17 '24 19:01 pm0code

prior to 4000 series, they are not capable of split encoding, this feature does not exist on the chip.

Jan 17 '24 21:01 claudiomarassi

Based on your information, I got a 4070TI, hoping that it would encode twice as fast as my other GPUs. as you can see, it still loads it up to almost 50% when you encode one file. I have tested it and if I encode two files at the same time, then it is loaded to 100%, no difference in behavior between my GTX970 and RTX407TI

Jan 27 '24 17:01 pm0code

You will need to put the parameter --split-enc auto_forced or --split-enc forced_2 https://github.com/rigaya/NVEnc/blob/master/NVEncC_Options.en.md#--split-enc-string

Remembering that it may not double the speed if the video decode engine reaches 100% first. For example, here in 1080p I get 240 fps and I can't double it to 480 fps, in 4k I get 120 fps with one and 240 fps with two.

Jan 27 '24 17:01 claudiomarassi

Thanks for the feedback and info. Regarding the video decode engine, I wonder if another decoder can handle the load, like another board like a 1660 or 3060 ?

Jan 27 '24 23:01 pm0code

Unfortunately, I don't know to answer that.

Jan 28 '24 00:01 claudiomarassi

@claudiomarassi , thank you very much for your feedback. I added the "--split-enc auto_forced" switch to test with a file and it indeed increased the encoding speed. However, it looks like when I encode with the above switch, the resulting file looses the HDR base layer. Have you seen this yourself?

Feb 03 '24 03:02 pm0code

@rigaya ^ ?

Feb 04 '24 00:02 pm0code

@claudiomarassi , thank you very much for your feedback. I added the "--split-enc auto_forced" switch to test with a file and it indeed increased the encoding speed. However, it looks like when I encode with the above switch, the resulting file looses the HDR base layer. Have you seen this yourself?

What settings are you using? Based on the first picture you provided, there are no settings to preserve the HDR information.

Well, that is all fine and dandy, but by my reference is Staxrip that uses the NVEncC (x64) 7.40 (r2648) driver in two instances on two different GPUs side by side encoding the same video file and observing the FPS. they are more or less the same meaning the second encode engine on GTX970 is doing nothing, obvious by task manager showing it at 50%. The question then becomes, "How can Split screen encoding " be enabled in Staxrip? OR Are you saying that if I allocate another encoding session to the GTX970, the driver will allocate it to the second encoder on that chip?

Thanks.

Feb 27 '24 01:02 quamt

Thanks for the feedback and info. Regarding the video decode engine, I wonder if another decoder can handle the load, like another board like a 1660 or 3060 ?

It is possible to decode a video using a different card and then encode it with the 4070ti. However, this requires piping it in NVENC from one GPU to the other GPU, which may slow down the process.

You can take a look here: https://github.com/rigaya/NVEnc/issues/326

Example: "NVEncC64.exe" -i "YOUR VIDEO" "YOUR SETTINGS" - --output-format nut | "NVEncC64.exe" "YOUR SETTINGS" -i - -o "YOUR_OUTPUT_VIDEO" `

Feb 27 '24 01:02 quamt

I did some further testing, and you can also pipe from NVIDIA to Intel if you want to.

Example: "NVEncC64.exe" --avhw -i "YOUR_VIDEO" --vpp-nvvfx-denoise --audio-copy -o - --output-format nut | "QSVEncC64.exe" --avhw -i - --option-file "RECODE.txt" --audio-copy -o "YOUR_VIDEO_OUT"

Output on Display:

NVEncC (x64) 7.41 (r2681) by rigaya, Jan 22 2024 13:02:15 (VC 1929/Win)
OS Version     Windows 11 x64 (22631) [UTF-8]
CPU            13th Gen Intel Core i5-13600 [4.71GHz] (6P+8E,14C/20T)
GPU            #0: NVIDIA GeForce RTX 2060 (1920 cores, 1680 MHz)[PCIe3x16][551.61]
NVENC / CUDA   NVENC API 12.1, CUDA 12.4, schedule mode: auto
Input Buffers  CUDA, 20 frames
Input Info     avcuvid: H.264/AVC, 1920x1080, 24000/1001 fps
AVSync         vfr
Vpp Filters    cspconv(nv12 -> yv12)
               nvvfx-denoise: cspconv(yv12 -> bgr(fp32))
                              nvvfx-denoise: strength 0
                              cspconv(bgr(fp32) -> yv12)
               cspconv(yv12 -> nv12)
Output Info    H.264/AVC high @ Level auto
               1920x1080p 1:1 23.976fps (24000/1001fps)
               avwriter: h264, eac3 => nut
Encoder Preset default
Rate Control   VBR
Multipass      none
Bitrate        0 kbps (Max: 24000 kbps)
Target Quality 25.00
Initial QP     I:20  P:23  B:25
QP range       I:0-51  P:0-51  B:0-51
QP Offset      cb:0  cr:0
VBV buf size   auto
Split Enc Mode auto
Lookahead      off
GOP length     240 frames
B frames       3 frames [ref mode: disabled]
Ref frames     3 frames, MultiRef L0:auto L1:auto
AQ             off
Others         mv:auto cabac deblock adapt-transform:auto bdirect:auto
PG is not supported on this platform, switched to FF mode.
cop.SingleSeiNalUnit value changed off -> auto by driver
QSVEncC (x64) 7.59 (r3244) by rigaya, Feb 11 2024 12:30:40 (VC 1937/Win)
OS             Windows 11 x64 (22631) [UTF-8]
CPU Info       13th Gen Intel Core i5-13600 [4.70GHz] (6P+8E,14C/20T) <DG2>
GPU Info       Intel Arc A770 Graphics (512EU) 300-2400MHz (31.0.101.5333)
Media SDK      QuickSyncVideo (hardware encoder) FF, 1st GPU, API v2.10
Async Depth    1024 frames
Hyper Mode     off
Buffer Memory  d3d11, 4256 work buffer
Input Info     avqsv: H.264/AVC, 1920x1080, 24000/1001 fps
VPP            ColorFmtConvertion: nv12 -> p010
               Denoise post, strength 10
               cspconv(p010 -> yv12(16bit))
               deband: mode 1, range 15, threY 15, threCb 15, threCr 15
                       ditherY 15, ditherC 15, blurFirst no, randEachFrame no
               cspconv(yv12(16bit) -> p010)
AVSync         cfr
Output         H.265/HEVC(yuv420 10bit) main10 @ Level 5 (high tier)
               1920x1080p 1:1 23.976fps (24000/1001fps)
               avwriter: hevc, ac3 => matroska
Target usage   1 - best
Encode Mode    Constant QP (CQP)
CQP Value      I:32  P:32  B:34
Scenario Info  archive
QP Limit       min: 22, max: 63
Ref frames     6 frames
Bframes        16 frames, B-pyramid: on
Max GOP Length 240 frames
VUI            chromaloc:left
atcsei         auto

Feb 28 '24 00:02 quamt