NVEncC (x64) 7.40 (r2648) does not handle both encode engines
Hi, thank you for the great work, very much appreciated! I have a GTX970 GPU card and according to NVIDIA it has two encode engines. but when I run an encode task with Staxrip that uses your {NVEncC (x64) 7.40 (r2648) by rigaya} I only see the GPU to be used only about 50%. According to NVIDIA, the driver should automatically handle the encode engines. https://docs.nvidia.com/video-technologies/video-codec-sdk/12.1/nvenc-application-note/index.html#nvenc-performance
Thus, if the GPU has 2 NVENCs (e.g. GP104, AD104), multiply the corresponding number in Table 3 by the number of NVENCs per chip to get aggregate maximum performance (applicable only when running multiple simultaneous encode sessions). Note that unless Split Frame Encoding is enabled, performance with single encoding session cannot exceed performance per NVENC, regardless of the number of NVENCs present on the GPU. Multi NVENC Split Frame Encoding is a feature introduced in SDK12.0 on Ada GPUs for HEVC and AV1.
Well, that is all fine and dandy, but by my reference is Staxrip that uses the NVEncC (x64) 7.40 (r2648) driver in two instances on two different GPUs side by side encoding the same video file and observing the FPS. they are more or less the same meaning the second encode engine on GTX970 is doing nothing, obvious by task manager showing it at 50%. The question then becomes, "How can Split screen encoding " be enabled in Staxrip? OR Are you saying that if I allocate another encoding session to the GTX970, the driver will allocate it to the second encoder on that chip?
Thanks.
Yes, if you encode two files at the same time it will allocate one on each nvenc chip, totaling 100% usage.
In Geforces Series 4000 it can use both nvenc chips in the same file, doubling the speed.
so, to summarize, for cards prior to 4000 series, either split screen encoding has to be enabled OR two encodes on the same chip. for 4000 series, this happens automatically for one encode on the same chip. Correct?
prior to 4000 series, they are not capable of split encoding, this feature does not exist on the chip.
Based on your information, I got a 4070TI, hoping that it would encode twice as fast as my other GPUs. as you can see, it still loads it up to almost 50% when you encode one file. I have tested it and if I encode two files at the same time, then it is loaded to 100%, no difference in behavior between my GTX970 and RTX407TI
You will need to put the parameter --split-enc auto_forced or --split-enc forced_2 https://github.com/rigaya/NVEnc/blob/master/NVEncC_Options.en.md#--split-enc-string
Remembering that it may not double the speed if the video decode engine reaches 100% first. For example, here in 1080p I get 240 fps and I can't double it to 480 fps, in 4k I get 120 fps with one and 240 fps with two.
Thanks for the feedback and info. Regarding the video decode engine, I wonder if another decoder can handle the load, like another board like a 1660 or 3060 ?
Unfortunately, I don't know to answer that.
@claudiomarassi , thank you very much for your feedback. I added the "--split-enc auto_forced" switch to test with a file and it indeed increased the encoding speed. However, it looks like when I encode with the above switch, the resulting file looses the HDR base layer. Have you seen this yourself?
@rigaya ^ ?
@claudiomarassi , thank you very much for your feedback. I added the "--split-enc auto_forced" switch to test with a file and it indeed increased the encoding speed. However, it looks like when I encode with the above switch, the resulting file looses the HDR base layer. Have you seen this yourself?
What settings are you using? Based on the first picture you provided, there are no settings to preserve the HDR information.
Well, that is all fine and dandy, but by my reference is Staxrip that uses the NVEncC (x64) 7.40 (r2648) driver in two instances on two different GPUs side by side encoding the same video file and observing the FPS. they are more or less the same meaning the second encode engine on GTX970 is doing nothing, obvious by task manager showing it at 50%. The question then becomes, "How can Split screen encoding " be enabled in Staxrip? OR Are you saying that if I allocate another encoding session to the GTX970, the driver will allocate it to the second encoder on that chip?
Thanks.
Thanks for the feedback and info. Regarding the video decode engine, I wonder if another decoder can handle the load, like another board like a 1660 or 3060 ?
It is possible to decode a video using a different card and then encode it with the 4070ti. However, this requires piping it in NVENC from one GPU to the other GPU, which may slow down the process.
You can take a look here:
https://github.com/rigaya/NVEnc/issues/326
Example:
"NVEncC64.exe" -i "YOUR VIDEO" "YOUR SETTINGS" - --output-format nut | "NVEncC64.exe" "YOUR SETTINGS" -i - -o "YOUR_OUTPUT_VIDEO" `
I did some further testing, and you can also pipe from NVIDIA to Intel if you want to.
Example:
"NVEncC64.exe" --avhw -i "YOUR_VIDEO" --vpp-nvvfx-denoise --audio-copy -o - --output-format nut | "QSVEncC64.exe" --avhw -i - --option-file "RECODE.txt" --audio-copy -o "YOUR_VIDEO_OUT"
Output on Display:
NVEncC (x64) 7.41 (r2681) by rigaya, Jan 22 2024 13:02:15 (VC 1929/Win)
OS Version Windows 11 x64 (22631) [UTF-8]
CPU 13th Gen Intel Core i5-13600 [4.71GHz] (6P+8E,14C/20T)
GPU #0: NVIDIA GeForce RTX 2060 (1920 cores, 1680 MHz)[PCIe3x16][551.61]
NVENC / CUDA NVENC API 12.1, CUDA 12.4, schedule mode: auto
Input Buffers CUDA, 20 frames
Input Info avcuvid: H.264/AVC, 1920x1080, 24000/1001 fps
AVSync vfr
Vpp Filters cspconv(nv12 -> yv12)
nvvfx-denoise: cspconv(yv12 -> bgr(fp32))
nvvfx-denoise: strength 0
cspconv(bgr(fp32) -> yv12)
cspconv(yv12 -> nv12)
Output Info H.264/AVC high @ Level auto
1920x1080p 1:1 23.976fps (24000/1001fps)
avwriter: h264, eac3 => nut
Encoder Preset default
Rate Control VBR
Multipass none
Bitrate 0 kbps (Max: 24000 kbps)
Target Quality 25.00
Initial QP I:20 P:23 B:25
QP range I:0-51 P:0-51 B:0-51
QP Offset cb:0 cr:0
VBV buf size auto
Split Enc Mode auto
Lookahead off
GOP length 240 frames
B frames 3 frames [ref mode: disabled]
Ref frames 3 frames, MultiRef L0:auto L1:auto
AQ off
Others mv:auto cabac deblock adapt-transform:auto bdirect:auto
PG is not supported on this platform, switched to FF mode.
cop.SingleSeiNalUnit value changed off -> auto by driver
QSVEncC (x64) 7.59 (r3244) by rigaya, Feb 11 2024 12:30:40 (VC 1937/Win)
OS Windows 11 x64 (22631) [UTF-8]
CPU Info 13th Gen Intel Core i5-13600 [4.70GHz] (6P+8E,14C/20T) <DG2>
GPU Info Intel Arc A770 Graphics (512EU) 300-2400MHz (31.0.101.5333)
Media SDK QuickSyncVideo (hardware encoder) FF, 1st GPU, API v2.10
Async Depth 1024 frames
Hyper Mode off
Buffer Memory d3d11, 4256 work buffer
Input Info avqsv: H.264/AVC, 1920x1080, 24000/1001 fps
VPP ColorFmtConvertion: nv12 -> p010
Denoise post, strength 10
cspconv(p010 -> yv12(16bit))
deband: mode 1, range 15, threY 15, threCb 15, threCr 15
ditherY 15, ditherC 15, blurFirst no, randEachFrame no
cspconv(yv12(16bit) -> p010)
AVSync cfr
Output H.265/HEVC(yuv420 10bit) main10 @ Level 5 (high tier)
1920x1080p 1:1 23.976fps (24000/1001fps)
avwriter: hevc, ac3 => matroska
Target usage 1 - best
Encode Mode Constant QP (CQP)
CQP Value I:32 P:32 B:34
Scenario Info archive
QP Limit min: 22, max: 63
Ref frames 6 frames
Bframes 16 frames, B-pyramid: on
Max GOP Length 240 frames
VUI chromaloc:left
atcsei auto
