oneVPL-intel-gpu icon indicating copy to clipboard operation
oneVPL-intel-gpu copied to clipboard

Remove double copy to/from GPU in hwupload and hwdownload

Open skolelis opened this issue 8 months ago • 1 comments

This commit is planned to remove double copying to/from GPU in hwupload and hwdownload.

It was tested on VPL GPU Runtime 2024Q4 Release - 24.4.4 and also on top of main branch. And it was successful there. It gave performance gain of 16-21% for below command line: ffmpeg
-qsv_device /dev/dri/renderD128 -hwaccel qsv
-pix_fmt yuv422p10le -video_size 1920x1080 -i /videos/1920x1080p10le_1.yuv
-pix_fmt yuv422p10le -video_size 1920x1080 -i /videos/1920x1080p10le_1.yuv
-pix_fmt yuv422p10le -video_size 1920x1080 -i /videos/1920x1080p10le_1.yuv
-pix_fmt yuv422p10le -video_size 1920x1080 -i /videos/1920x1080p10le_1.yuv
-pix_fmt yuv422p10le -video_size 1920x1080 -i /videos/1920x1080p10le_1.yuv
-pix_fmt yuv422p10le -video_size 1920x1080 -i /videos/1920x1080p10le_1.yuv
-pix_fmt yuv422p10le -video_size 1920x1080 -i /videos/1920x1080p10le_1.yuv
-pix_fmt yuv422p10le -video_size 1920x1080 -i /videos/1920x1080p10le_1.yuv
-filter_complex "[0:v]hwupload,scale_qsv=iw/4:ih/2[out0];
[1:v]hwupload,scale_qsv=iw/4:ih/2[out1];
[2:v]hwupload,scale_qsv=iw/4:ih/2[out2];
[3:v]hwupload,scale_qsv=iw/4:ih/2[out3];
[4:v]hwupload,scale_qsv=iw/4:ih/2[out4];
[5:v]hwupload,scale_qsv=iw/4:ih/2[out5];
[6:v]hwupload,scale_qsv=iw/4:ih/2[out6];
[7:v]hwupload,scale_qsv=iw/4:ih/2[out7];
[out0][out1][out2][out3]
[out4][out5][out6][out7]
xstack_qsv=inputs=8:
layout=0_0|w0_0|0_h0|w0_h0|w0+w1_0|w0+w1+w2_0|w0+w1_h0|w0+w1+w2_h0,
format=y210le,format=yuv422p10le"
/videos/recv_1920x1080p10le.yuv

However we also noticed a bug that is disappearing after applying our changes.

The bug is: the result video file has repeated (4) lines (rows) in the lowest part of the picture. And probably lost some of the lines (rows) somewhere above.

You can see it with the exemplary command of ffmpeg with qsv plug-in:

ffmpeg
-qsv_device /dev/dri/renderD128 -hwaccel qsv
-pix_fmt yuv422p10le -video_size 1920x1080 -i /videos/1920x1080p10le_1.yuv
-filter_complex "[0:v]hwupload,scale_qsv=iw/4:ih/2,format=y210le,format=yuv422p10le"
/videos/recv_1920x1080p10le.yuv

Signed-off-by: Szymon Kolelis [email protected]

skolelis avatar Mar 04 '25 09:03 skolelis

vuyx/ayuv format fails after this change. But it does bring a perf uplift over other formats.

./ffmpeg -hide_banner -init_hw_device vaapi=va:/dev/dri/renderD128 -init_hw_device qsv=qs@va -filter_hw_device qs \
-f lavfi -i nullsrc=s=1920x1080,format=vuyx -vf hwupload -f null - -v verbose

[Parsed_nullsrc_0 @ 0x564c5a012ec0] size:1920x1080 rate:25/1 duration:-1.000000 sar:1/1
Input #0, lavfi, from 'nullsrc=s=1920x1080,format=vuyx':
  Duration: N/A, start: 0.000000, bitrate: N/A
  Stream #0:0: Video: wrapped_avframe, 1 reference frame, vuyx, 1920x1080 [SAR 1:1 DAR 16:9], 25 fps, 25 tbr, 25 tbn
[out#0/null @ 0x564c5a015ac0] No explicit maps, mapping streams automatically...
[vost#0:0/wrapped_avframe @ 0x564c5a016200] Created video stream from input stream 0:0
Stream mapping:
  Stream #0:0 -> #0:0 (wrapped_avframe (native) -> wrapped_avframe (native))
[vost#0:0/wrapped_avframe @ 0x564c5a016200] Starting thread...
[vf#0:0 @ 0x564c5a016740] Starting thread...
[vist#0:0/wrapped_avframe @ 0x564c5a015940] [dec:wrapped_avframe @ 0x564c5a017a80] Starting thread...
[in#0/lavfi @ 0x564c5a010b80] Starting thread...
Press [q] to stop, [?] for help
[graph -1 input from stream 0:0 @ 0x7f8504002c80] w:1920 h:1080 pixfmt:vuyx tb:1/25 fr:25/1 sar:1/1 csp:unknown range:unknown
[AVHWDeviceContext @ 0x7f8504003fc0] VAAPI driver: Intel iHD driver for Intel(R) Gen Graphics - 25.1.4 (ae179e1).
[AVHWDeviceContext @ 0x7f8504003fc0] Driver not found in known nonstandard list, using standard behaviour.
[graph -1 input from stream 0:0 @ 0x7f8504002c80] video frame properties congruent with link at pts_time: 0
[AVHWFramesContext @ 0x7f85040043c0] Use Intel(R) oneVPL to create MFX session, API version is 2.14, the required implementation version is 2.14
[AVHWFramesContext @ 0x7f85040043c0] Initialize MFX session: implementation version is 2.14
[AVHWFramesContext @ 0x7f85040043c0] Error synchronizing the operation
[hwupload @ 0x7f8504002980] Failed to upload frame: -1313558101.
...
uname -a
Linux 6.14.0-1-drm-tip-git-geb7714c3b051 #79 SMP PREEMPT_DYNAMIC Sun, 30 Mar 2025 13:19:59 +0000 x86_64 GNU/Linux

lspci -knn | grep -E "i915|xe|VGA|Display"                                                               
00:02.0 Display controller [0380]: Intel Corporation Alder Lake-P GT2 [Iris Xe Graphics] [8086:46a6] (rev 0c)
        Kernel driver in use: i915
        Kernel modules: i915, xe

nyanmisaka avatar Mar 30 '25 14:03 nyanmisaka