Huge thanks for `--parallel` implementation! Benchmarked results across many resolutions
Hi @rigaya
I just wanted to sincerely thank you for your fantastic work on QSVEnc and especially the --parallel implementation. I ran a benchmark across a range of videos and resolutions using both HEVC and H.264, and the speed gains are massive — all while output sizes stayed consistent (or even slightly better!).
Test Setup Test content: I used a variety of Big Buck Bunny files at resolutions from 320x180 up to 2160p60.
Extended content: Each input file was concatenated 3x using ffmpeg to simulate a longer real-world encoding task.
Comparison targets:
Codecs: --codec hevc and --codec h264
Modes: Standard vs --parallel auto
Settings:
HEVC: --profile main10 --output-depth 10
H.264: default profile (no bit-depth)
Hardware:
CPU: Ryzen 9 5950X
GPU: 2x Intel Arc A770
QSVEnc Version: 7.84 (r3665)
Results Summary:
After running all these tests, it's honestly impressive how well the --parallel implementation performs in QSVEnc.
Looking at the results, it's clear:
Whether it’s 320x180 or full 4K at 60fps, enabling--parallelgives a consistent and meaningful speed boost.
In most cases around 70–75% faster, sometimes even doubling the encode speed.
Even more impressive? The output file sizes stayed nearly the same, with only tiny differences (sometimes even smaller!). No need to compromise on quality or size to get that performance gain.
Amazing work, thank you again for your ongoing development!
Thank you for all the tests and sharing the result!
Although having some restrictions, I think that the current --parallel implementation is an efficient way to utilize multiple GPUs at the same time with minimum compression loss.
Theoretically, using the multiple encoders with same encoding speed would result the highest efficiency, but I haven't been able to test with 2 dGPUs myself (I tested with iGPU + dGPU). Therefore, it's really nice to be able to know by your test that --parallel actually works quite efficient in terms of speedup ratio with 2x Arc A770.
Thanks again for the detailed report!
You're very welcome — happy to contribute and help validate such an excellent feature!
Just to add a bit more context: since I'm using two Intel Arc A770 cards, I’ve set their power limits to 110W each, not just during testing but also during the actual encoding runs. This keeps the temperatures below 65°C consistently. I haven’t measured whether this power limit has a notable impact on encoding speed, but it’s something I could look into if it helps.
Also, if it would be useful, I have access to a different system with two NVIDIA RTX 3080 cards (running at stock settings. No power limits or overclocking). I’d be happy to run similar parallel encoding tests on that machine and share the results as well.
Let me know if that’s of interest and once again, thanks for your amazing work!
Arc A770 power limits
Generally, encode itself does not pull so much power (if you don't use vpp filters), so I think 110W power limit shouldn't have much impact on encoding speed.
Also, if it would be useful, I have access to a different system with two NVIDIA RTX 3080 cards (running at stock settings. No power limits or overclocking). I’d be happy to run similar parallel encoding tests on that machine and share the results as well.
Thank you for your offer! Yes, sharing results would be useful, as my NVEnc test I did when I released the first version using different GPUs (RTX4080+GTX1060) was "so-so" at 2K, only getting 30-40% boost. Although RTX4080 has 2 encoders on one card, decoder is 1, so that might be the bottleneck.
It would be nice to know whether using 2 same dGPUs like your 2x RTX 3080 system actually performs better, to know the limits of this feature.
Although the input is not same, my test results which I did on initial release was below.
Depending on test conditions, some performs well, but some do not, as when mixing encoders with different speed, the slower encoder limits the performance boost.
Hardware Intel i9 12900K Intel Arc B580 (PCIe3x4) + UHD770 NVIDIA RTX4080(PCIe4x16) + GTX1060(PCIe3x1) Win11 24H2 QSVEnc 7.84 NVEncC 8.00 beta6
4K input: 3840x2160 HEVC 10bit 23.976fps 17min 1sec option: -c [h264/hevc] [--output-depth 10] [--parallel 2] --audio-copy
2K: input: 1920x1080 MPEG2 29.97fps interlace 30min 6sec option: -c [h264/hevc] [--output-depth 10] [--parallel 2] --audio-copy --tff --vpp-deinterlace normal
QSVEnc
| B580 single | B580+UHD770 --parallel 2 | |||
|---|---|---|---|---|
| 2K | H.264 | 8bit | 357 fps | 637 fps(+78%) |
| 2K | HEVC | 8bit | 353 fps | 636 fps(+80%) |
| 2K | HEVC | 10bit | 353 fps | 626 fps(+77%) |
| 4K | H.264 | 8bit | 147 fps | 248 fps(+69%) |
| 4K | HEVC | 8bit | 167 fps | 215 fps(+29%) |
| 4K | HEVC | 10bit | 167 fps | 217 fps(+30%) |
NVEncC
| RTX4080 single | RTX4080 --parallel 2 | RTX4080+GTX1060 --parallel 3 | |||
|---|---|---|---|---|---|
| 2K | H.264 | 8bit | 504 fps | 733 fps(+45%) | 848 fps(+68%) |
| 2K | HEVC | 8bit | 484 fps | 687 fps(+42%) | |
| 2K | HEVC | 10bit | 489 fps | 618 fps(+26%) | |
| 4K | H.264 | 8bit | 131 fps | 249 fps(+90%) | 250 fps(+91%) |
| 4K | HEVC | 8bit | 133 fps | 263 fps(+98%) | |
| 4K | HEVC | 10bit | 133 fps | 258 fps(+94%) |
You're very welcome @rigaya
As promised, I ran a follow-up test on the second machine equipped with 2x NVIDIA RTX 3080 cards. This system runs at stock settings with no power limits or overclocking.
Here's a quick overview of the hardware and NVEnc version:
CPU: Intel Core i9-12900K (4.92GHz, 16C/24T) GPU: 2x NVIDIA RTX 3080 Driver : 572.83 NVEncC: 8.03 (x64)
Like the Arc A770 setup, I tested encoding with --parallel and logged speed and file size differences.
Here’s a quick snapshot of the results:
Observations from the 2x RTX 3080 test:
Speed gains with --parallel were consistently in the 60–90% range, depending on resolution and codec.
Even at 4K, the performance boost stayed above 89% in most tests.
Like with the Arc A770s, the file size difference remained minimal or negligible, which is excellent.
Compared to your mixed RTX4080 + GTX1060 results, using two identical dGPUs (even older ones) helps avoid any performance drag from mismatched encoder speeds.
One quick question I was wondering about:
According to Intel’s specs, the Arc A770 has “2 Multi-Format Codec Engines”.
Shouldn’t that mean each card technically has 2 hardware encoders available?
Or is there a practice limitation preventing both from being used simultaneously with QSV or --parallel?
Would love to understand how that’s handled internally, if you happen to know.
Thank you for the tests in NVEnc, 2x RTX3080 looks great! I'm glad to be able to know that it is quite efficient with 2 identical GPUs also in NVEnc, especially 4K speed up around 90% is quite nice.
According to Intel’s specs, the Arc A770 has “2 Multi-Format Codec Engines”. Shouldn’t that mean each card technically has 2 hardware encoders available?
Actually I didn't notice that, but yes I also think so.
I've tested using --parallel 2 with one B580, which also said to have "2 Multi-Format Codec Engines". --parallel might be able to use the 2nd engine, as I were able to get around around 70% performance increase with --parallel. But the weird thing is that task manager or even HWiNFO reports nearly 100% video engine utilization with --parallel disabled, so I'm not sure...
Hardware Intel i9 12900K Intel Arc B580 (PCIe3x4) Win11 24H2 QSVEnc 7.85
4K input: 3840x2160 HEVC 10bit 23.976fps 17min 1sec option: -c [h264/hevc] [--output-depth 10] [--parallel 2] -d 1 --audio-copy
2K input: 1920x1080 MPEG2 29.97fps interlace 30min 6sec option: -c [h264/hevc] [--output-depth 10] [--parallel 2] -d 1 --audio-copy --tff --vpp-deinterlace normal
result
| B580 single | B580 --parallel 2 | |||
|---|---|---|---|---|
| 2K | H.264 | 8bit | 344 fps | 607 fps(+76%) |
| 2K | HEVC | 8bit | 341 fps | 583 fps(+71%) |
| 2K | HEVC | 10bit | 337 fps | 570 fps(+69%) |
| 4K | H.264 | 8bit | 147 fps | 206 fps(+40%) |
| 4K | HEVC | 8bit | 167 fps | 225 fps(+34%) |
| 4K | HEVC | 10bit | 167 fps | 225 fps(+35%) |
@rigaya
I ran extended benchmarking using --parallel implementation in QSVEnc, with --parallel 4.
GPU: 2x Intel Arc A770 (1 at PCIe 4.0 x16, 1 at PCIe 4.0 x4) I noticed you mentioned this as well, so I added it.
Observations Performance (FPS) Parallel 2 consistently boosted FPS by ~70–80% on average compared to normal mode. This held true across most resolutions and codecs, including 1080p and 2160p.
Parallel 4 further increased performance in some cases, but with diminishing returns. The average gain from 2x to 4x was often between 5–15% depending on resolution.
For example:
bbb_sunflower_1080p_30fps_normal_x3_h264: Normal: 472 FPS, Parallel 2: 868 FPS, Parallel 4: 935 FPS
bbb_sunflower_2160p_60fps_normal_x3_hevc: Normal: 134 FPS, Parallel 2: 210 FPS, Parallel 4: 286 FPS
File Size File sizes remained very close across all modes, with only minor variations (typically under 1–2%).
This confirms the compression quality is stable even when running across multiple encoders in parallel.
System Behaviour & Side Note
When running with --parallel 4, I encountered an unusual side effect:
monitor flickering and desktop instability, which persisted after the encode completed.
A system reboot was required to fully restore normal display behaviour.
In Task Manager, I observed that one GPU completed its assigned tasks earlier, while the other continued processing. This suggests a possible workload distribution imbalance across the two GPUs.
While I can’t conclusively attribute this to hardware limitations, my educated guess is that the PCIe x4 bandwidth constraint on one of the Arc A770 cards may be contributing to reduced throughput or uneven task allocation.
If accurate, this could help explain the diminishing performance gains observed when scaling from --parallel 2 to --parallel 4, especially in high-resolution HEVC scenarios where bandwidth and decode throughput may become critical factors.
Additionally, based on previous experience running two concurrent encodes on a single GPU, I noticed that one encode would typically operate at full speed, while the second ran at significantly reduced FPS. This kind of encoder contention could also be influencing performance scaling in multi-GPU scenarios, particularly if multiple parallel encode sessions are unintentionally stacked on the same device.
At this time, I haven’t been able to verify whether --parallel 4 consistently distributes encoding jobs across both discrete GPUs, or if multiple sessions are sometimes assigned to a single GPU, depending on driver or API behaviour.
If there’s a way to explicitly pin sessions per device or improve task distribution across hardware, it might help optimize parallel performance further; though this may be limited by current system architecture and driver scheduling.
That said, the scaling achieved is still excellent, and any performance quirks noted here are more likely due to system-specific characteristics than QSVEnc’s implementation itself.
Thank you for sharing the results for --parallel 4, it's quite interesting. I think monitor flickering might be caused by high utilization or memory usage.
As you have said, the performance gain might be limited by PCIe x4 bandwidth, or encode sessions not distributed evenly.
I've made a test build that can print which GPU is selected by each session to check out the encode session distribution. If the session are distributed equally, then the result you've shared might be the limit of the hardware(or system), but if not, I might be able to improve the GPU selection implementation by checking the log of it. https://nightly.link/rigaya/QSVEnc/actions/runs/14334060052/QSVEncC_release_r3683_x64.zip
Below is log print of my test for --parallel 4 in this test build.
Parallel Enc 0: GPU #1 (Intel Arc B580 Graphics) score: 300.0: Use: 100.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 0: GPU #2 (Intel UHD Graphics 770) score: 300.0: Use: 100.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: GPU #1 (Intel Arc B580 Graphics) score: 200.0: Use: 0.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: GPU #2 (Intel UHD Graphics 770) score: 300.0: Use: 100.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 0: Selected GPU #1 (Intel Arc B580 Graphics)
Parallel Enc 2: GPU #1 (Intel Arc B580 Graphics) score: 200.0: Use: 0.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 2: GPU #2 (Intel UHD Graphics 770) score: 200.0: Use: 0.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: Selected GPU #2 (Intel UHD Graphics 770)
Parallel Enc 3: GPU #1 (Intel Arc B580 Graphics) score: 200.0: Use: 0.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 3: GPU #2 (Intel UHD Graphics 770) score: 250.0: Use: 50.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 2: Selected GPU #1 (Intel Arc B580 Graphics)
Parallel Enc 3: Selected GPU #2 (Intel UHD Graphics 770)
In this case session 0 and 2 runs on B580, with session 1 and 3 running on UHD770.
Thank you
I run a few encodes here some results:
H:\QSVEnc_Output\big_buck_bunny_1080p_h264_x3_h264_parallel4.mkv
--------------------------------------------------------------------------------
Parallel Enc 0: GPU #1 (Intel Arc A770 Graphics) score: 300.0: Use: 100.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 0: GPU #2 (Intel Arc A770 Graphics) score: 299.9: Use: 100.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: GPU #1 (Intel Arc A770 Graphics) score: 200.0: Use: 0.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: GPU #2 (Intel Arc A770 Graphics) score: 299.9: Use: 100.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 0: Selected GPU #1 (Intel Arc A770 Graphics)
Parallel Enc 2: GPU #1 (Intel Arc A770 Graphics) score: 200.0: Use: 0.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 2: GPU #2 (Intel Arc A770 Graphics) score: 199.9: Use: 0.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: Selected GPU #2 (Intel Arc A770 Graphics)
Parallel Enc 3: GPU #1 (Intel Arc A770 Graphics) score: 200.0: Use: 0.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 3: GPU #2 (Intel Arc A770 Graphics) score: 249.9: Use: 50.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 3: Selected GPU #2 (Intel Arc A770 Graphics)
Parallel Enc 2: Selected GPU #1 (Intel Arc A770 Graphics)
QSVEncC (x64) 7.85 (r3683) by rigaya, Apr 8 2025 12:57:34 (VC 1943/Win)
H:\QSVEnc_Output\big_buck_bunny_1080p_h264_x3_hevc_parallel4.mkv
--------------------------------------------------------------------------------
Parallel Enc 0: GPU #1 (Intel Arc A770 Graphics) score: 300.0: Use: 100.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 0: GPU #2 (Intel Arc A770 Graphics) score: 299.9: Use: 100.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: GPU #1 (Intel Arc A770 Graphics) score: 200.0: Use: 0.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: GPU #2 (Intel Arc A770 Graphics) score: 299.9: Use: 100.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 0: Selected GPU #1 (Intel Arc A770 Graphics)
cop.SingleSeiNalUnit value changed off -> auto by driver
Parallel Enc 2: GPU #1 (Intel Arc A770 Graphics) score: 190.4: Use: 0.0, VE 100.0, GPU 90.4, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 2: GPU #2 (Intel Arc A770 Graphics) score: 199.9: Use: 0.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: Selected GPU #2 (Intel Arc A770 Graphics)
cop.SingleSeiNalUnit value changed off -> auto by driver
Parallel Enc 3: GPU #1 (Intel Arc A770 Graphics) score: 208.1: Use: 50.0, VE 100.0, GPU 58.1, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 3: GPU #2 (Intel Arc A770 Graphics) score: 186.0: Use: 0.0, VE 100.0, GPU 86.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 2: Selected GPU #2 (Intel Arc A770 Graphics)
Parallel Enc 3: Selected GPU #1 (Intel Arc A770 Graphics)
cop.SingleSeiNalUnit value changed off -> auto by driver
QSVEncC (x64) 7.85 (r3683) by rigaya, Apr 8 2025 12:57:34 (VC 1943/Win)
H:\QSVEnc_Output\bbb_sunflower_2160p_30fps_normal_x3_h264_parallel4.mkv
--------------------------------------------------------------------------------
Parallel Enc 0: GPU #1 (Intel Arc A770 Graphics) score: 300.0: Use: 100.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 0: GPU #2 (Intel Arc A770 Graphics) score: 299.9: Use: 100.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: GPU #1 (Intel Arc A770 Graphics) score: 200.0: Use: 0.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: GPU #2 (Intel Arc A770 Graphics) score: 299.9: Use: 100.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 0: Selected GPU #1 (Intel Arc A770 Graphics)
Parallel Enc 2: GPU #1 (Intel Arc A770 Graphics) score: 200.0: Use: 0.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 2: GPU #2 (Intel Arc A770 Graphics) score: 199.9: Use: 0.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: Selected GPU #2 (Intel Arc A770 Graphics)
Parallel Enc 3: GPU #1 (Intel Arc A770 Graphics) score: 200.0: Use: 0.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 3: GPU #2 (Intel Arc A770 Graphics) score: 249.9: Use: 50.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 2: Selected GPU #1 (Intel Arc A770 Graphics)
Parallel Enc 3: Selected GPU #2 (Intel Arc A770 Graphics)
QSVEncC (x64) 7.85 (r3683) by rigaya, Apr 8 2025 12:57:34 (VC 1943/Win)
Each file used 4 parallel encodes.
All encoding jobs were split evenly between GPU # 1 and GPU # 2.
Than I came across this:
--------------------------------------------------------------------------------
H:\QSVEnc_Output\bbb_sunflower_2160p_30fps_normal_x3_hevc_parallel4.mkv
--------------------------------------------------------------------------------
Parallel Enc 0: Selected GPU #1 (Intel Arc A770 Graphics)
cop.SingleSeiNalUnit value changed off -> auto by driver
Parallel Enc 1: Selected GPU #1 (Intel Arc A770 Graphics)
cop.SingleSeiNalUnit value changed off -> auto by driver
Parallel Enc 2: Selected GPU #1 (Intel Arc A770 Graphics)
Parallel Enc 3: Selected GPU #1 (Intel Arc A770 Graphics)
cop.SingleSeiNalUnit value changed off -> auto by driver
QSVEncC (x64) 7.85 (r3683) by rigaya, Apr 8 2025 12:57:34 (VC 1943/Win)
In the HEVC encode of the 4K video, everything was pushed to GPU # 1.
I made another test run only 4K this time
H:\QSVEnc_Output\bbb_sunflower_2160p_30fps_normal_x3_h264_parallel4.mkv
--------------------------------------------------------------------------------
Parallel Enc 0: GPU #1 (Intel Arc A770 Graphics) score: 299.9: Use: 100.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 0: GPU #2 (Intel Arc A770 Graphics) score: 299.9: Use: 100.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: GPU #1 (Intel Arc A770 Graphics) score: 200.0: Use: 0.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: GPU #2 (Intel Arc A770 Graphics) score: 299.9: Use: 100.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 0: Selected GPU #1 (Intel Arc A770 Graphics)
Parallel Enc 2: GPU #1 (Intel Arc A770 Graphics) score: 200.0: Use: 0.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 2: GPU #2 (Intel Arc A770 Graphics) score: 199.9: Use: 0.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: Selected GPU #2 (Intel Arc A770 Graphics)
Parallel Enc 3: GPU #1 (Intel Arc A770 Graphics) score: 200.0: Use: 0.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 3: GPU #2 (Intel Arc A770 Graphics) score: 249.9: Use: 50.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 2: Selected GPU #1 (Intel Arc A770 Graphics)
Parallel Enc 3: Selected GPU #2 (Intel Arc A770 Graphics)
QSVEncC (x64) 7.85 (r3683) by rigaya, Apr 8 2025 12:57:34 (VC 1943/Win)
H:\QSVEnc_Output\bbb_sunflower_2160p_30fps_normal_x3_hevc_parallel4.mkv
--------------------------------------------------------------------------------
Parallel Enc 0: GPU #1 (Intel Arc A770 Graphics) score: 300.0: Use: 100.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 0: GPU #2 (Intel Arc A770 Graphics) score: 299.9: Use: 100.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: GPU #1 (Intel Arc A770 Graphics) score: 200.0: Use: 0.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: GPU #2 (Intel Arc A770 Graphics) score: 299.9: Use: 100.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 0: Selected GPU #1 (Intel Arc A770 Graphics)
cop.SingleSeiNalUnit value changed off -> auto by driver
Parallel Enc 2: GPU #1 (Intel Arc A770 Graphics) score: 200.0: Use: 0.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 2: GPU #2 (Intel Arc A770 Graphics) score: 199.9: Use: 0.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: Selected GPU #2 (Intel Arc A770 Graphics)
cop.SingleSeiNalUnit value changed off -> auto by driver
Parallel Enc 3: GPU #1 (Intel Arc A770 Graphics) score: 147.6: Use: 0.0, VE 100.0, GPU 47.6, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 3: GPU #2 (Intel Arc A770 Graphics) score: 249.9: Use: 50.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 3: Selected GPU #2 (Intel Arc A770 Graphics)
Parallel Enc 2: Selected GPU #1 (Intel Arc A770 Graphics)
cop.SingleSeiNalUnit value changed off -> auto by driver
QSVEncC (x64) 7.85 (r3683) by rigaya, Apr 8 2025 12:57:34 (VC 1943/Win)
H:\QSVEnc_Output\bbb_sunflower_2160p_60fps_normal_x3_h264_parallel4.mkv
--------------------------------------------------------------------------------
Parallel Enc 0: GPU #1 (Intel Arc A770 Graphics) score: 300.0: Use: 100.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 0: GPU #2 (Intel Arc A770 Graphics) score: 299.9: Use: 100.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: GPU #1 (Intel Arc A770 Graphics) score: 199.9: Use: 0.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: GPU #2 (Intel Arc A770 Graphics) score: 299.9: Use: 100.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 0: Selected GPU #1 (Intel Arc A770 Graphics)
Parallel Enc 2: GPU #1 (Intel Arc A770 Graphics) score: 200.0: Use: 0.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 2: GPU #2 (Intel Arc A770 Graphics) score: 199.9: Use: 0.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: Selected GPU #2 (Intel Arc A770 Graphics)
Parallel Enc 3: GPU #1 (Intel Arc A770 Graphics) score: 200.0: Use: 0.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 3: GPU #2 (Intel Arc A770 Graphics) score: 249.9: Use: 50.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 2: Selected GPU #1 (Intel Arc A770 Graphics)
Parallel Enc 3: Selected GPU #2 (Intel Arc A770 Graphics)
QSVEncC (x64) 7.85 (r3683) by rigaya, Apr 8 2025 12:57:34 (VC 1943/Win)
And same behaviour here:
H:\QSVEnc_Output\bbb_sunflower_2160p_60fps_normal_x3_hevc_parallel4.mkv
--------------------------------------------------------------------------------
Parallel Enc 0: Selected GPU #1 (Intel Arc A770 Graphics)
cop.SingleSeiNalUnit value changed off -> auto by driver
Parallel Enc 1: Selected GPU #1 (Intel Arc A770 Graphics)
cop.SingleSeiNalUnit value changed off -> auto by driver
Parallel Enc 2: Selected GPU #1 (Intel Arc A770 Graphics)
Parallel Enc 3: Selected GPU #1 (Intel Arc A770 Graphics)
cop.SingleSeiNalUnit value changed off -> auto by driver
QSVEncC (x64) 7.85 (r3683) by rigaya, Apr 8 2025 12:57:34 (VC 1943/Win)
test run with 1080P
H:\QSVEnc_Output\bbb_sunflower_1080p_30fps_normal_x3_hevc_parallel4.mkv
--------------------------------------------------------------------------------
Parallel Enc 0: GPU #1 (Intel Arc A770 Graphics) score: 300.0: Use: 100.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 0: GPU #2 (Intel Arc A770 Graphics) score: 299.9: Use: 100.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: GPU #1 (Intel Arc A770 Graphics) score: 200.0: Use: 0.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: GPU #2 (Intel Arc A770 Graphics) score: 299.9: Use: 100.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 0: Selected GPU #1 (Intel Arc A770 Graphics)
cop.SingleSeiNalUnit value changed off -> auto by driver
Parallel Enc 2: GPU #1 (Intel Arc A770 Graphics) score: 184.4: Use: 0.0, VE 100.0, GPU 84.4, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 2: GPU #2 (Intel Arc A770 Graphics) score: 199.9: Use: 0.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: Selected GPU #2 (Intel Arc A770 Graphics)
cop.SingleSeiNalUnit value changed off -> auto by driver
Parallel Enc 3: GPU #1 (Intel Arc A770 Graphics) score: 215.7: Use: 50.0, VE 100.0, GPU 65.7, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 3: GPU #2 (Intel Arc A770 Graphics) score: 185.2: Use: 0.0, VE 100.0, GPU 85.2, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 2: Selected GPU #2 (Intel Arc A770 Graphics)
Parallel Enc 3: Selected GPU #1 (Intel Arc A770 Graphics)
cop.SingleSeiNalUnit value changed off -> auto by driver
QSVEncC (x64) 7.85 (r3683) by rigaya, Apr 8 2025 12:57:34 (VC 1943/Win)
H:\QSVEnc_Output\bbb_sunflower_1080p_30fps_normal_x3_hevc_parallel4.mkv
--------------------------------------------------------------------------------
Parallel Enc 0: GPU #1 (Intel Arc A770 Graphics) score: 300.0: Use: 100.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 0: GPU #2 (Intel Arc A770 Graphics) score: 299.9: Use: 100.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: GPU #1 (Intel Arc A770 Graphics) score: 200.0: Use: 0.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: GPU #2 (Intel Arc A770 Graphics) score: 299.9: Use: 100.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 0: Selected GPU #1 (Intel Arc A770 Graphics)
cop.SingleSeiNalUnit value changed off -> auto by driver
Parallel Enc 2: GPU #1 (Intel Arc A770 Graphics) score: 184.4: Use: 0.0, VE 100.0, GPU 84.4, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 2: GPU #2 (Intel Arc A770 Graphics) score: 199.9: Use: 0.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: Selected GPU #2 (Intel Arc A770 Graphics)
cop.SingleSeiNalUnit value changed off -> auto by driver
Parallel Enc 3: GPU #1 (Intel Arc A770 Graphics) score: 215.7: Use: 50.0, VE 100.0, GPU 65.7, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 3: GPU #2 (Intel Arc A770 Graphics) score: 185.2: Use: 0.0, VE 100.0, GPU 85.2, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 2: Selected GPU #2 (Intel Arc A770 Graphics)
Parallel Enc 3: Selected GPU #1 (Intel Arc A770 Graphics)
cop.SingleSeiNalUnit value changed off -> auto by driver
QSVEncC (x64) 7.85 (r3683) by rigaya, Apr 8 2025 12:57:34 (VC 1943/Win)
H:\QSVEnc_Output\bbb_sunflower_1080p_60fps_normal_x3_h264_parallel4.mkv
--------------------------------------------------------------------------------
Parallel Enc 0: GPU #1 (Intel Arc A770 Graphics) score: 300.0: Use: 100.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 0: GPU #2 (Intel Arc A770 Graphics) score: 299.9: Use: 100.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: GPU #1 (Intel Arc A770 Graphics) score: 200.0: Use: 0.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: GPU #2 (Intel Arc A770 Graphics) score: 299.9: Use: 100.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 0: Selected GPU #1 (Intel Arc A770 Graphics)
Parallel Enc 2: GPU #1 (Intel Arc A770 Graphics) score: 200.0: Use: 0.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 2: GPU #2 (Intel Arc A770 Graphics) score: 199.9: Use: 0.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: Selected GPU #2 (Intel Arc A770 Graphics)
Parallel Enc 3: GPU #1 (Intel Arc A770 Graphics) score: 200.0: Use: 0.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 3: GPU #2 (Intel Arc A770 Graphics) score: 249.9: Use: 50.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 2: Selected GPU #1 (Intel Arc A770 Graphics)
Parallel Enc 3: Selected GPU #2 (Intel Arc A770 Graphics)
QSVEncC (x64) 7.85 (r3683) by rigaya, Apr 8 2025 12:57:34 (VC 1943/Win)
H:\QSVEnc_Output\bbb_sunflower_1080p_60fps_normal_x3_hevc_parallel4.mkv
--------------------------------------------------------------------------------
Parallel Enc 0: GPU #1 (Intel Arc A770 Graphics) score: 300.0: Use: 100.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 0: GPU #2 (Intel Arc A770 Graphics) score: 299.9: Use: 100.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: GPU #1 (Intel Arc A770 Graphics) score: 200.0: Use: 0.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: GPU #2 (Intel Arc A770 Graphics) score: 299.9: Use: 100.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 0: Selected GPU #1 (Intel Arc A770 Graphics)
cop.SingleSeiNalUnit value changed off -> auto by driver
Parallel Enc 2: GPU #1 (Intel Arc A770 Graphics) score: 186.0: Use: 0.0, VE 100.0, GPU 86.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 2: GPU #2 (Intel Arc A770 Graphics) score: 199.9: Use: 0.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: Selected GPU #2 (Intel Arc A770 Graphics)
cop.SingleSeiNalUnit value changed off -> auto by driver
Parallel Enc 3: GPU #1 (Intel Arc A770 Graphics) score: 211.4: Use: 50.0, VE 100.0, GPU 61.4, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 3: GPU #2 (Intel Arc A770 Graphics) score: 185.2: Use: 0.0, VE 100.0, GPU 85.2, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 2: Selected GPU #2 (Intel Arc A770 Graphics)
Parallel Enc 3: Selected GPU #1 (Intel Arc A770 Graphics)
cop.SingleSeiNalUnit value changed off -> auto by driver
QSVEncC (x64) 7.85 (r3683) by rigaya, Apr 8 2025 12:57:34 (VC 1943/Win)
And here too:
H:\QSVEnc_Output\big_buck_bunny_1080p_h264_x3_h264_parallel4.mkv
--------------------------------------------------------------------------------
Parallel Enc 0: Selected GPU #1 (Intel Arc A770 Graphics)
Parallel Enc 1: Selected GPU #1 (Intel Arc A770 Graphics)
Parallel Enc 2: Selected GPU #1 (Intel Arc A770 Graphics)
Parallel Enc 3: Selected GPU #1 (Intel Arc A770 Graphics)
QSVEncC (x64) 7.85 (r3683) by rigaya, Apr 8 2025 12:57:34 (VC 1943/Win)
After further testing, I’ve observed some inconsistent GPU allocation behaviour when using QSVEncC with parallel encoding and multiple GPUs (Intel Arc A770 x2). While most encoding sessions correctly distribute the workload evenly across both GPUs, there are intermittent cases where all parallel jobs are assigned to only one GPU — typically GPU # 1 — even though both GPUs are fully available and report similar load and scores.
Key Observations: This behavior does not appear to be tied to a specific codec, resolution, or frame rate.
It occurs intermittently, and not consistently at the same spot in the encode queue.
It only seems to happen when multiple encodes are run in sequence via a batch script (i.e., one after another, without pause).
In all affected cases, there are no errors or warnings in the log; the encode completes successfully, but GPU allocation is suboptimal.
This raises the question: Could batch execution be affecting GPU resource release or initialization?
I suspect this might be the case because the issue appears to occur when encodes are launched back-to-back through a batch file, potentially without enough delay or system cleanup between jobs. It’s possible that previous processes aren’t fully releasing GPU resources, or that the Media SDK/driver hasn’t had time to reset before the next job begins.
I haven’t yet tested adding a delay between jobs in the batch script, but I plan to do so now and will report back with the results.
I added a 5-second delay between encodes in the batch script to ensure that GPU resources had time to release between jobs.
After running multiple 1080p encodes (which balanced GPU usage correctly), I waited manually, then started a new encode using a 4K file. In that encode, all four parallel jobs were assigned to GPU # 1, despite both GPUs being idle and available.
This suggests that: The issue is not simply caused by the lack of delay between jobs. GPU # 2 may still be "ignored" under certain conditions, even when idle. The problem may be tied to driver state or prior GPU affinity lingering across sessions.
Question, is the build optimised for parallel 4? Because when I use parallel 2 I always get this kind of behaviour:
test.mkv
--------------------------------------------------------------------------------
Parallel Enc 1: Selected GPU #1 (Intel Arc A770 Graphics)
Parallel Enc 0: Selected GPU #1 (Intel Arc A770 Graphics)
cop.SingleSeiNalUnit value changed off -> auto by driver
QSVEncC (x64) 7.85 (r3683) by rigaya, Apr 8 2025 12:57:34 (VC 1943/Win)
Thank you for the detailed tests.
When both GPUs are detected, session seem to be distributed as ideal, but it seems like somehow only 1 GPU is detected by the thread when all four parallel jobs were assigned to one GPU. The 2nd GPU "disappeared".
Whenever there is no line with "score" shown, it means that only 1 GPU is detected by the thread which runs each session.
Currently, I cannot come up of an idea of why this is happening. Maybe too many opening/closing encode session at makes GPU too busy? (as there is multiple open/close of session when starting encoding)
Could batch execution be affecting GPU resource release or initialization?
Not sure, but generally not. I've tested on my system around 50 times using bat file with --parallel 4, but could not find GPU disappearing, perhaps as the system config differs (B580 + UHD770). I think it's more likely to be a problem at initialization phase.
Question, is the build optimised for parallel 4?
No, I just added log output from QSVEnc 7.85. One more change included, but that is about AV1 encoding, I don't think it affects this problem.
@rigaya thank you for your great work and for improving this topic more and more. You are the best, I can't thank you enough.
Hello,
can you give me the exact ffmpeg command for the concat that you use to split the files? Furthermore, the exact command for encoding in H264, H265 and AV1 would be helpful.
How did you do it with the listing of the results?
The reason for this is that I want to test the whole thing with my 4x Arc A310 machine and also on my 2x Arc380 machine.
It would be perfect if I could pass this on as a feature request for StaxRip. This would perfect the encoding for me.
Thanks in advance.
@KickAss0815
I use a combination of PowerShell scripts and batch files for automated video testing, encoding with QSVEnc, and result analysis.
Disclaimer The included batch/Powershell scripts are provided as-is, without any warranty or guarantee of fitness for a particular purpose.
You are free to use, modify, and distribute these scripts, but you do so at your own risk.
The author is not responsible for any data loss, hardware issues, encoding errors, or system instability that may result from using or modifying these files. Always review and test scripts in a controlled environment before using them on important data or production systems.
1. Concatenation Script (PowerShell) Used to extend video clips by repeating the same file 3x, utilizing ffmpeg.
# Set the path to your ffmpeg.exe
$ffmpegPath = "D:\rigaya\ffmpeg\ffmpeg.exe"
# Current working directory
$currentDir = Get-Location
# Folder to store extended files
$extendedFolder = Join-Path $currentDir "ExtendedVideos"
if (-not (Test-Path $extendedFolder)) {
New-Item -Path $extendedFolder -ItemType Directory | Out-Null
}
# Define supported video extensions
$extensions = @("*.mp4", "*.mov", "*.m4v", "*.avi")
# Loop through each extension and get matching files
foreach ($ext in $extensions) {
$videos = Get-ChildItem -Path $currentDir -Filter $ext
foreach ($video in $videos) {
$inputName = $video.Name
$outputName = [System.IO.Path]::GetFileNameWithoutExtension($inputName) + "_x3.mp4"
$outputPath = Join-Path $extendedFolder $outputName
Write-Host "`nProcessing: $inputName"
# Create concat list
$concatList = @()
for ($i = 0; $i -lt 3; $i++) {
$concatList += "file '$inputName'"
}
Set-Content -Path "concat_list.txt" -Value $concatList
# Run ffmpeg to concatenate
& $ffmpegPath -f concat -safe 0 -i "concat_list.txt" -c copy $outputPath
# Clean up
Remove-Item "concat_list.txt"
}
}
Write-Host "`n All supported videos have been extended x3 and saved in: $extendedFolder"
2. QSVEnc Batch Encoding Script
Automates encoding in H.264 and HEVC, both normal and --parallel modes.
@echo off
setlocal EnableDelayedExpansion
:: === Config ===
set QSVENC="E:\rigaya\NVEncC\NVEncC64.exe"
set OUTPUTDIR=H:\NVEnc_Output
if not exist "%OUTPUTDIR%" (
mkdir "%OUTPUTDIR%"
)
:: === Supported video extensions
for %%F in (*.mp4 *.mov *.avi *.m4v) do (
call :encode_file "%%~F"
)
echo.
echo All encodes finished!
pause
exit /b
:: === Encoding function
:encode_file
set INPUT=%~1
set NAME=%~n1
:: === Define encoding modes
call :run_encode "%INPUT%" "%NAME%" "hevc" "normal" "--profile main10 --output-depth 10"
call :run_encode "%INPUT%" "%NAME%" "hevc" "parallel" "--profile main10 --output-depth 10 --parallel 2"
call :run_encode "%INPUT%" "%NAME%" "h264" "normal" ""
call :run_encode "%INPUT%" "%NAME%" "h264" "parallel" "--parallel 2"
exit /b
:: === Run actual command
:run_encode
set INPUT=%~1
set NAME=%~2
set CODEC=%~3
set MODE=%~4
set EXTRA=%~5
set OUTNAME=%NAME%_%CODEC%_%MODE%_64bit
set OUTPUT=%OUTPUTDIR%\%OUTNAME%.mkv
set LOG=%OUTPUTDIR%\%OUTNAME%.log
echo.
echo ============================================
echo Encoding: %INPUT%
echo Codec: %CODEC%
echo Mode: %MODE%
echo Output: %OUTPUT%
echo Log: %LOG%
echo.
"%QSVENC%" -i "%INPUT%" --codec %CODEC% %EXTRA% -o "%OUTPUT%" > "%LOG%" 2>&1
if %errorlevel% neq 0 (
echo [ERROR] Encoding failed for %INPUT%
) else (
echo [DONE] %OUTNAME%
)
exit /b
3. Analysis Scripts (PowerShell) Step 1 – Create summary.txt from all encode logs:
# === CONFIGURATION ===
$logFolder = "H:\QSVEnc_Output"
$outputFile = Join-Path $logFolder "summary.txt"
# Clear previous summary if exists
if (Test-Path $outputFile) {
Remove-Item $outputFile
}
# Get all log files
$logFiles = Get-ChildItem -Path $logFolder -Filter *.log
foreach ($log in $logFiles) {
$logPath = $log.FullName
$logName = $log.BaseName
# Read all lines
$lines = Get-Content $logPath
# Filter last 6 lines that include encoding summary info
$summaryLines = $lines | Where-Object {
($_ -match '^encoded \d+ frames') -or
($_ -match '^encode time') -or
($_ -match '^frame type IDR') -or
($_ -match '^frame type I') -or
($_ -match '^frame type P') -or
($_ -match '^frame type B')
}
# If summary lines exist, write to output file
if ($summaryLines.Count -gt 0) {
Add-Content $outputFile "===== $logName ====="
$summaryLines | ForEach-Object { Add-Content $outputFile $_ }
Add-Content $outputFile ""
}
}
Write-Host " Summary created at: $outputFile" -ForegroundColor Green
Step 2 – Parse and compare performance:
# CONFIG
$summaryPath = "H:\QSVEnc_Output\summary.txt"
if (-not (Test-Path $summaryPath)) {
Write-Host " summary.txt not found!" -ForegroundColor Red
exit 1
}
$entries = @{}
# === PARSE SUMMARY FILE ===
$lines = Get-Content $summaryPath
$currentFile = ""
foreach ($line in $lines) {
if ($line -match "^===== (.+) =====$") {
$currentFile = $matches[1]
$entries[$currentFile] = @{}
}
elseif ($line -match "^encoded \d+ frames, ([\d\.]+) fps.*?, ([\d\.]+) kbps, ([\d\.]+) MB") {
$entries[$currentFile].FPS = [double]$matches[1]
$entries[$currentFile].Bitrate = [double]$matches[2]
$entries[$currentFile].SizeMB = [double]$matches[3]
}
}
# === COMPARE NORMAL vs PARALLEL ===
$results = @()
$grouped = $entries.Keys | Group-Object { ($_ -replace "_(normal|parallel)_64bit$", "").ToLower() }
foreach ($group in $grouped) {
$base = $group.Name
$normalKey = "$base`_normal_64bit"
$parallelKey = "$base`_parallel_64bit"
if ($entries.ContainsKey($normalKey) -and $entries.ContainsKey($parallelKey)) {
$n = $entries[$normalKey]
$p = $entries[$parallelKey]
# Extract resolution height
$resolution = 0
if ($base -match "(\d{3,4})p") {
$resolution = [int]$matches[1]
} elseif ($base -match "(\d{3,4})x(\d{2,4})") {
$resolution = [int]$matches[2]
}
# Extract framerate
$framerate = 0
if ($base -match "(\d{2,3})fps") {
$framerate = [int]$matches[1]
}
# Differences
$deltaFPS = $p.FPS - $n.FPS
$speedup = if ($n.FPS -ne 0) { [math]::Round(($deltaFPS / $n.FPS) * 100, 2) } else { 0 }
$deltaSize = $p.SizeMB - $n.SizeMB
$sizeDiffPct = if ($n.SizeMB -ne 0) { [math]::Round(($deltaSize / $n.SizeMB) * 100, 2) } else { 0 }
# Result entry
$results += [pscustomobject]@{
FileGroup = $base
Resolution = $resolution
Framerate = $framerate
Normal_FPS = [math]::Round($n.FPS, 2)
Parallel_FPS = [math]::Round($p.FPS, 2)
FPS_Diff = [math]::Round($deltaFPS, 2)
Speedup_Pct = "$speedup`%"
Normal_Size = "$([math]::Round($n.SizeMB, 2)) MB"
Parallel_Size = "$([math]::Round($p.SizeMB, 2)) MB"
Size_Diff = "$([math]::Round($deltaSize, 2)) MB"
Size_Diff_MB = [math]::Round($deltaSize, 2)
Size_Change = "$sizeDiffPct`%"
}
}
}
# === SORT RESULTS ===
$resultsSorted = $results | Sort-Object Resolution, Framerate, FileGroup
# === DISPLAY TABLE ===
$resultsSorted | Format-Table -AutoSize
# === AVERAGES ===
$avgSpeedup = ($results | Measure-Object -Property FPS_Diff -Average).Average
$avgSizeDiff = ($results | Measure-Object -Property Size_Diff_MB -Average).Average
Write-Host "`nAVERAGES:"
Write-Host " Average FPS Gain: $([math]::Round($avgSpeedup, 2)) FPS"
Write-Host " Average Size Change: $([math]::Round($avgSizeDiff, 2)) MB"
Notes These scripts are customized for my specific system setup. You will likely need to update file paths, tools, and hardware settings before running them.
AV1 encoding is not included, as I didn’t run benchmarks for that codec. If you’d like to test AV1, you’ll need to adapt the scripts accordingly.
While the structure may not be perfectly elegant, the scripts are built with a focus on reliability and function. They do the job - feel free to improve or streamline them further.
Disclaimer The included batch/Powershell scripts are provided as-is, without any warranty or guarantee of fitness for a particular purpose.
You are free to use, modify, and distribute these scripts, but you do so at your own risk.
The author is not responsible for any data loss, hardware issues, encoding errors, or system instability that may result from using or modifying these files. Always review and test scripts in a controlled environment before using them on important data or production systems.
@rigaya
I ran some further tests with --parallel 4 and wanted to share the findings - hopefully they’re useful.
Additional Observations with --parallel 4 (Updated Tests)
After running a fresh round of tests using --parallel 4 and inserting a 55-second delay between encodes (to avoid GPU overlap or contention), I was able to gain new insights from QSVEnc's log output.
What Worked Well Parallel task scheduling now clearly shows how QSVEnc selects GPUs based on internal scoring (score: 300.0, 299.9, etc.).
Across several encodes, both Arc A770 GPUs are actively utilized, and all 4 encodes are attempted in parallel as expected.
The score-based GPU selection logic appears robust and dynamic — GPUs with the highest availability (low Use, high score) are prioritized per encode slot.
Issues Encountered
- GPU Selection Imbalance In some cases, the same GPU is selected for multiple parallel encoders, even though the second GPU appears available and similarly scored.
Example from HEVC test:
Parallel Enc 2: Selected GPU #1
Parallel Enc 3: Selected GPU #1
This suggests either a race condition or score calculation timing issue. GPU # 2 had a valid score (199.9) but was not selected despite GPU # 1 already being used.
2. OpenCL Failure and Decode Errors You're seeing:
Error (clBuildProgram): out of host memory.
MFXDEC: DecodeFrameAsync error: undefined behavior.
Failed to initialize HW Device.: null pointer.
These are serious initialization failures likely tied to:
OpenCL resource exhaustion
Memory fragmentation or allocation issues (possibly VRAM or system memory)
D3D11/D3D9 fallback attempts failing (D3D11CreateDevice, CreateDeviceEx both fail)
They don’t crash QSVEnc, but may degrade performance or stability.
Hypotheses and Analysis
The “out of host memory” during clBuildProgram implies OpenCL kernels may be compiled on the fly for each stream, and under --parallel 4, the simultaneous demand is too high.
This could stem from:
OpenCL runtime limits on the driver level
GPU VRAM being maxed or fragmented (especially with repeated launches)
Driver’s memory management limitations under multiple D3D/OpenCL contexts
Since the failure affects initialization, not mid-encode, the risk is on start-up resource contention.
Arc A770’s OpenCL stack may not be optimized for this many parallel HW contexts on one card.
The PCIe x4 card may be overloaded with memory transfers, amplifying the problem, especially if decode and encode are on separate cards.
Summary
--parallel 4 is functional, and GPU distribution is generally effective - but not always optimally balanced.
Likely hitting driver or GPU-level memory/context limits, not necessarily a bug in QSVEnc.
QSVEnc still completes the job even with these issues, showing robust error handling.
Suggestions / Ideas
Delay Between Launches seems to help - could be exposed as a formal --parallel-delay option.
Expose GPU affinity (e.g., --gpu-assign) to explicitly bind encoder instances to specific GPUs.
Preload or cache OpenCL kernels to avoid per-session compilation.
Information output:
H:\QSVEnc_Output\BigBuckBunny_320x180_x3_h264_parallel4.mkv
--------------------------------------------------------------------------------
Parallel Enc 0: GPU #1 (Intel Arc A770 Graphics) score: 300.0: Use: 100.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 0: GPU #2 (Intel Arc A770 Graphics) score: 299.9: Use: 100.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: GPU #1 (Intel Arc A770 Graphics) score: 200.0: Use: 0.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: GPU #2 (Intel Arc A770 Graphics) score: 299.9: Use: 100.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 0: Selected GPU #1 (Intel Arc A770 Graphics)
Parallel Enc 2: GPU #1 (Intel Arc A770 Graphics) score: 200.0: Use: 0.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 2: GPU #2 (Intel Arc A770 Graphics) score: 199.9: Use: 0.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: Selected GPU #2 (Intel Arc A770 Graphics)
--------------------------------------------------------------------------
building OpenCL source: size 668.
options:
source
--------------------------------------------------------------------------
build log of Intel(R) Arc(TM) A770 Graphics...
--------------------------------------------------------------------------
Error (clBuildProgram): out of hots memory.
MFXDEC: DecodeFrameAsync error: undefined behavior..
Break in task MFXDEC: undefined behavior..
d3d11: D3D11Device: D3D11CreateDevice: -2005270523
d3d9: D3D9Device: Failed CreateDeviceEx: -2005530516.
Failed to initialize HW Device.: null pointer..
Parallel Enc 3: Selected GPU #1 (Intel Arc A770 Graphics)
Parallel Enc 2: Selected GPU #1 (Intel Arc A770 Graphics)
QSVEncC (x64) 7.85 (r3683) by rigaya, Apr 8 2025 12:57:34 (VC 1943/Win)
H:\QSVEnc_Output\BigBuckBunny_320x180_x3_hevc_parallel4.mkv
--------------------------------------------------------------------------------
Parallel Enc 0: GPU #1 (Intel Arc A770 Graphics) score: 300.0: Use: 100.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 0: GPU #2 (Intel Arc A770 Graphics) score: 299.9: Use: 100.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: GPU #1 (Intel Arc A770 Graphics) score: 200.0: Use: 0.0, VE 100.0, GPU 100.0, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: GPU #2 (Intel Arc A770 Graphics) score: 299.9: Use: 100.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 0: Selected GPU #1 (Intel Arc A770 Graphics)
cop.SingleSeiNalUnit value changed off -> auto by driver
Parallel Enc 2: GPU #1 (Intel Arc A770 Graphics) score: 194.8: Use: 0.0, VE 100.0, GPU 94.8, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 2: GPU #2 (Intel Arc A770 Graphics) score: 199.9: Use: 0.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 1: Selected GPU #2 (Intel Arc A770 Graphics)
cop.SingleSeiNalUnit value changed off -> auto by driver
Parallel Enc 3: GPU #1 (Intel Arc A770 Graphics) score: 245.3: Use: 50.0, VE 100.0, GPU 95.3, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 3: GPU #2 (Intel Arc A770 Graphics) score: 199.9: Use: 0.0, VE 100.0, GPU 99.9, CC 0.0, Core 0.0, CL 0.0.
Parallel Enc 2: Selected GPU #2 (Intel Arc A770 Graphics)
Parallel Enc 3: Selected GPU #1 (Intel Arc A770 Graphics)
MFXDEC: DecodeFrameAsync error: undefined behavior..
Break in task MFXDEC: undefined behavior..
cop.SingleSeiNalUnit value changed off -> auto by driver
QSVEncC (x64) 7.85 (r3683) by rigaya, Apr 8 2025 12:57:34 (VC 1943/Win)
Thank you again for the detailed tests!
I've understood that little long wait helped.
new test build https://nightly.link/rigaya/QSVEnc/actions/runs/14380147275/QSVEncC_release_r3685_x64.zip
I've made a new test build(3685) which might improve the "GPU Selection Imbalance" and "OpenCL/decode errors." (but actually not sure...)
GPU Selection Imbalance
The score is mostly based of
(1). QSVEnc usage (Use) (2). "Video" Usage based on Windows Performance Counter (VE) (3). "Compute/3d" Usage based on Windows Performance Counter (GPU)
The problem is that (2) and (3) has lags, and not collected fine when starting multiple sessions at once.
The new test build(3685) lowers weight of (2) and (3), when there is a GPU that is used by QSVEnc more that one session. This might help GPU selection imbalance, as selection for session #3 is now more based on (1).
OpenCL Failure and Decode Errors
~The new test build(3685) will now start encode session thread after the previous thread finishes decode and filter initialization (previously they were potentially overlapped). This might relax pressure to the GPU.~ I found out still might overlap, some more time might be needed to change this.
Delay Between Launches
Delay Between Launches on QSVEnc side looks difficult as it require inter-process communication.
OpenCL kernel build issue Thank you for poiting out the issue.
The problem is that OpenCL kernels build are actually originally multithreaded (starts one thread per kernel).
Moreover, as you have assumed, each session builds their own kernel, as there is a possibility that the sessions are run on different device which might have different architecture.
The multithread kernel build was implemented to minimize kernel build time, but now it might be easily making too much threads when used with --parallel, and I think it is now causing this "out of memory" issue. I'll need to make these stop using individual threads when --parallel is enabled, but it'll take some time for implementation.
@rigaya,
Thanks again for the updated test build (r3685) and your detailed explanation of the changes - especially around GPU selection scoring and session initialization timing.
New Test Observation (--parallel 4 with Delay)
I ran a new set of tests using --parallel 4, with a 55-second delay between jobs to minimize potential resource contention and overlapping session starts.
At first, everything looked good - GPU scores were evaluated and displayed, and both Intel Arc A770 cards were being actively used.
However, during one of the runs, something unexpected occurred:
The system abruptly restarted mid-encode.
This has never happened before, even under --parallel 2 loads.
Event Viewer Findings (Pre-Crash) Several relevant log entries showed up just before the reboot:
Intel Graphics Software Service Event ID 230 (Error) IGCL ctlOverclocking set function - Exception
Event ID 252 & 285 (Warnings) "Possible property value desync" Related to Overclocking_GpuPowerLimit and GpuPerformanceBoost. This might suggest a mismatch between requested and actual hardware values.
Kernel Event Tracing Event ID 2 (Error) NT Kernel Logger failed with error 3221225525. This may hint at tracing or logging being interrupted due to a critical fault.
System Stability No bluescreen was shown; the system simply rebooted.
No other critical events appeared in System or Application logs.
My Thoughts
While I can’t prove this was caused by QSVEnc, the timing strongly suggests a relationship to the simultaneous --parallel 4 load.
My cards are both capped at 110W for thermal control (~65°C under load), and otherwise stable under all other scenarios.
Conclusion
This reboot was a first in my entire series of tests and might reflect a rare edge case triggered by --parallel 4, OpenCL kernel spawning, and the unique hardware mix.
I just wanted to share these results with you in case they help fine-tune things further. If you’d like, I can try to reproduce the crash and capture a full system trace or memory dump.
Thanks again for all your work. QSVEnc’s parallel implementation is already incredibly powerful and these refinements are making it even better for multi-GPU setups.
Although I'm unsure about the forced reboot, it might be related to --parallel pushing too much task to the GPU.
QSVEnc 7.86 update QSVEnc 7.86 will now start encode session thread after the previous thread finishes decode and filter initialization.
One more change is that it will now run OpenCL build in thread pool, limiting the maximum thread count for OpenCL kernel build to 8, which should avoid the OpenCL kernel build using too much resources.
Also, some redundant initialization tasks are removed. which might result in further stability.
Although QSVEnc 7.86 will not show the GPU select result by default unlike the test versions, by adding --log-level gpu_select=debug you can still show the GPU select when needed for testing.
Although I'm unsure about the forced reboot, it might be related to --parallel pushing too much task to the GPU.
QSVEnc 7.86 update QSVEnc 7.86 will now start encode session thread after the previous thread finishes decode and filter initialization.
One more change is that it will now run OpenCL build in thread pool, limiting the maximum thread count for OpenCL kernel build to 8, which should avoid the OpenCL kernel build using too much resources.
Also, some redundant initialization tasks are removed. which might result in further stability.
Although QSVEnc 7.86 will not show the GPU select result by default unlike the test versions, by adding
--log-level gpu_select=debugyou can still show the GPU select when needed for testing.
@rigaya
Just wanted to share a concise update from additional testing with QSVEncC v7.86 (r3693) using --parallel 4 mode and a 55-second delay between jobs (batch script).
GPU Utilization Overview (All Encodes)
| Encode Instance | GPU Selected | VE / GPU Load | Notes |
|---|---|---|---|
| Parallel Enc 0 | GPU # 1 | 100 / 100 | Always full load |
| Parallel Enc 1 | GPU # 2 | 100 / 100 | Consistent full load |
| Parallel Enc 2 | GPU # 2 | 100 / ~95–100 | Slight drop depending on input |
| Parallel Enc 3 | GPU # 1 | 50 / ~20–50 | Noticeable bottleneck |
Load balancing between GPUs is generally effective.
Parallel Enc 3 consistently shows reduced performance, regardless of codec/resolution.
Possible limiting factors:
PCIe x4 bandwidth (one of the A770s)
Thermal or VRM-related throttling
Driver-side GPU scheduling imbalance
System Instability Reproduced
Despite the delay between encodes, I experienced another system reboot mid-way through a HEVC encode (same pattern as before. No BSOD, just an abrupt restart). No warnings in the encoder log; all GPUs were heavily utilized at the time.
Still investigating, but it appears to be tied to heavy parallel GPU load during high-resolution HEVC tasks.
Let me know if you'd like logs or if further testing would help.
Additional Findings (Post-4K Removal Test) After removing the 4K video from the test batch, no system restarts occurred, even with multiple parallel encode sessions. This strongly suggests the instability was tied to the high-res workload (e.g., 2160p) under full GPU and CPU pressure.
GPU Load Balancing Still Holds - Even at Low Resolutions Interestingly, the 100/100 vs. 50/50 GPU usage split remains consistent, even with very low-resolution inputs (e.g., 320x180). This reinforces that the bottleneck on the 4th encode (Enc 3) is not due to video resolution, but likely due to hardware limitations (PCIe bandwidth, GPU scheduler contention, etc.).
Example - Balanced Distribution Still Happens:
H:\QSVEnc_Output\BigBuckBunny_320x180_x3_h264_parallel4.mkv
Parallel Enc 0: Selected GPU #1
Parallel Enc 1: Selected GPU #2
Parallel Enc 2: Selected GPU #1
Parallel Enc 3: Selected GPU #2
Utilization Pattern:
Enc 0 & 1 → VE 100 / GPU 100
Enc 3 → VE 50 / GPU ~50
However, in some encodes, all four sessions are sent to GPU # 1:
Example – All Jobs Sent to GPU # 1:
H:\QSVEnc_Output\bbb_sunflower_1080p_30fps_normal_x3_h264_parallel4.mkv
Parallel Enc 0: Selected GPU #1
Parallel Enc 1: Selected GPU #1
Parallel Enc 2: Selected GPU #1
Parallel Enc 3: Selected GPU #1
This could be a case of:
QSVEncC scoring GPU # 1 higher due to idle state or recent usage history.
A driver quirk or override behaviour for certain input parameters.
Update:
I also ran a test with --parallel 2 and observed the same behaviour where both sessions were assigned to GPU # 1, which likely influenced the results from my initial encoding tests as well.
@rigaya I'm running some more tests and will report back once they are completed, which might take a while. In the meantime is there any log I can activate that might show some info that can be beneficial?
Thank you for another testing!
I might have found the cause of all jobs being sent to GPU #1, and made a test build working around this issue. https://nightly.link/rigaya/QSVEnc/actions/runs/14544743346/QSVEncC_release_r3697_x64.zip
Details Previously, I've shared that the score used to select GPU was based on "QSVEnc usage" and "Video(VE) + Compute/3d(GPU)". "Video(VE) + Compute/3d(GPU)" Usage is based on Windows Performance Counter.
I've found that there was a case that opening Windows Performance Counter has failed. In that case, score calculation was skipped, therefore always selecting GPU # 1. (This is because "QSVEnc usage" score calculation is rather new feature added in QSVEnc 7.76, and previously it had only score calculation from Windows Performance Counter )
The test build will continue for score calculation even when opening Windows Performance Counter has failed, only using "QSVEnc usage", which should be able to distribute tasks into GPU # 2.
In the meantime is there any log I can activate that might show some info that can be beneficial?
If all jobs being sent to GPU # 1 continues, logs from --log-level gpu_select=debug,device=debug shall help much. Would you please run adding this option?
System Instability with 4K
There might be one more possibility here, I'm not sure but GPU RAM or CPU RAM usage might be approaching the limit. Actually, --parallel 4 does use too much memory in 4K encoding...
Thank you @rigaya
Test Results with Latest Build (r3679) - --parallel 2
Thank you for the updated build.
I ran a fresh round of tests using --parallel 2 and wanted to share a few key observations.
Improvements Observed The 55-second launch delay was removed during this round.
GPU selection appears more balanced than before.
All GPUs are being used during the encoding process as expected.
Strange Decode Engine Behaviour While checking Task Manager, I noticed unusual decode engine behaviour across multiple tests:
Each Arc A770 GPU normally shows two engines: "Video Decode" and "Video Decode 1".
Typically, one engine runs at 100%, and the second at ~50%.
However, sporadically, one of these decode engines sits at 0%, while the other is active.
Example:
1st GPU
2nd GPU
This behaviour is:
-
Not tied to resolution (seen at 1080p30, 2160p30, etc.)
-
Not tied to specific GPU (affects either one randomly)
-
Not consistent per video (same video can behave differently on repeat runs)
When this happens, the encode appears to slow down, likely due to reduced decode throughput on that GPU.
Even when reintroducing the 55-second delay between parallel session launches, the behaviour still occurred sporadically.
Logs Shared I’ve archived all logs from this batch into a single .zip file.
I haven’t tested --parallel 4 yet with this build.
Thank you. I really appreciate your responsiveness and the technical detail in your replies.
Update: --parallel 4 Test Run
I began a test run using --parallel 4.
The first encode 1080p 30fps AVC completed successfully without any issues.
During this run, I observed the following in Task Manager:
GPU 1: Video Decode and Video Decode 1 were both at 100%
GPU 2: Video Decode at 100%, Video Decode 1 fluctuating around 40%
However, once the second file began encoding 1080p 30fps HEVC the system experienced a sudden reboot during the process. According to the log file, the last recorded status was:
[21.2%] 12693 frames: 322.88 fps, 2320 kbps, remain 0:02:25, GPU 68%, VD 100%, est out size 551.4MB
Interestingly, this was not a 4K encode, so while it's possible that GPU or system memory limits were reached, the trigger appears to be less predictable.
As you previously mentioned:
"Actually, --parallel 4 does use too much memory in 4K encoding..."
That insight may still apply in part, but given the resolution here was 1080p, the cause might be elsewhere - possibly a cumulative resource issue across concurrent sessions.
Let me know if there’s anything specific you'd like me to log.
bbb_sunflower_1080p_30fps_normal_x3_h264_parallel4.log bbb_sunflower_1080p_30fps_normal_x3_hevc_parallel4.log
I'm currently running another test based on a new idea I had to help narrow down the issue. I'll report back here once the test has completed.
Further Testing with --parallel 2 and --avhw
I ran another round of tests using --parallel 2, this time enabling --avhw.
I noticed that the issue with Video Decode / Video Decode 1 imbalance now occurs even more frequently.
Roughly 50% of the time, one of the GPUs has either Video Decode or Video Decode 1 stuck at 0%.
As before, this isn’t tied to a specific resolution — the issue appears consistently across 1080p, 2160p, etc.
The GPU where both decode engines are active always completes its part faster.
System Utilization During the Test:
RAM usage: ~12 GB out of 64 GB
GPU 1: ~1.6 GB VRAM
GPU 2: ~0.6 GB VRAM
CPU usage: ~3%
Test file: 4K @ 60 FPS
At lower resolutions, Video Decode / Video Decode 1 usage often remains under 50–60%, possibly due to lower decoding demand.
I've included the batch log in a .zip file for reference.
Testing --parallel 4 with --avhw
I also ran tests with --parallel 4 and --avhw enabled.
GPU decode usage behaviour is similar to --parallel 2: one GPU often shows 100% / 100%, while the other shows 100% / ~40%.
These values also fluctuate slightly during the encode.
Interestingly, AVC encoding seems to utilize the decode engines more heavily than HEVC.
System Utilization During the Test:
RAM usage: ~11 GB out of 64 GB
GPU 1: ~1.6 GB VRAM
GPU 2: ~0.6 GB VRAM
CPU usage: ~12%
Test file: 4K @ 60 FPS
One more thing I observed - even though the batch script uses a 0-second delay, there's still a short wait before each encoding task begins.
This time, the system did not restart, which makes me think the issue may be tied to the software decoding fallback --avsw that is used when --avhw is not explicitly set. @rigaya , please feel free to correct me if I’m mistaken.
I've included the full logs from this test run as well, compressed in a .zip file.
Thank you for the detailed tests and logs provided.
I've looked through all the logs, and first of all, I can see that the tasks are always distributed evenly across GPU # 1 and # 2, meaning that the change to avoid all taks going into one GPU in the test build 3697 was successful.
Video Decode and Video Decode 1
Actually, I cannot check this at all on my system, Arc B580 only has "Video Decode" (and "Video Processing") on task manager, even though it seems to have 2 Media Engines.
Also, I've looked through the provided logs, but could not find a hint about this.
This looks actually difficult, the application side (QSVEnc) does have control on which GPU to run on, but does not have control on which Media Engine to run on. Therefore, on which Media Engine is used depends on the driver (or might be the hardware).
software decoding fallback
--avswthat is used when--avhwis not explicitly set
I'm not quite sure if actually decode has fallen back to sw decode. QSVEnc will use hw decode whenever it decides it is possible, but yes, there might be something going wrong when --parallel is used.
In order to make sure if the sw decode is used or not, please try using the following option.
--log-level core=debug,gpu_select=debug
This will output the log for main thread, and also for each sub threads. If the sw decode fallback is used, the log will show "avsw" (and not "avqsv") for the sub threads log. The problem here is that the log will be messed up by many outputs by the sub threads, but still we can know which decoder is used.
System Utilization
Thank you for checking the system utilization. I think that the system has plenty of RAM resource in this case, so the reboot seems to not related with RAM usage.
there's still a short wait before each encoding task begins.
If it's within 3 seconds or so, it might be simply initialization task going on. I've created a new test build (3707), which theoretically reduces initialization cost, but unfortunately, it does not seem to have much effect, actually being quite unnoticeable. https://nightly.link/rigaya/QSVEnc/actions/runs/14559556905/QSVEncC_release_r3707_x64.zip
Thank you, @rigaya – I’ve tested the new test build: QSVEncC_release_r3707_x64.
Test 1: --parallel 2
- No
--avhwflag used - 0-second delay between jobs
Result: Both Arc A770 GPUs were actively utilized.
- No system instability such as sudden reboots occurred.
Test 2: --parallel 4
- No
--avhwflag used - 0-second delay between jobs
Result: All encodes completed successfully without issues or reboots.
Additional Notes The Video Decode / Video Decode 1 imbalance still occasionally appears in Task Manager (i.e., one engine dropping to 0%), consistent with prior tests.
There appears to be a brief pause between encodes, even though no delay was configured. I assume this is part of the updated logic in the new build - perhaps waiting for all threads to finish before proceeding.
All logs from these two runs are included in the attached ZIP archive.
Next Steps
- I'm now running a full test suite covering:
- Codecs: H.264, HEVC (main10), AV1 (8-bit & 10-bit)
- Modes: Normal,
--parallel 2, and--parallel 4
If everything completes without issue, I’ll also provide an updated performance comparison summary.