compute-runtime icon indicating copy to clipboard operation
compute-runtime copied to clipboard

cl_khr_d3d11_sharing causes tearing and artifacts on DG2

Open nyanmisaka opened this issue 2 years ago • 3 comments

Hello there! I got some tearing and artifacts when sharing textures between the D3D11 and OpenCL.

Here's the main procedures to decode and share a video frame in FFmpeg:


1, hwcontext_d3d11va

  • Create D3D11 device on DG2, set ID3D10Multithread_SetMultithreadProtected to true.
  • Create ID3D11Texture2D texture array with D3D11_RESOURCE_MISC_SHARED.

2, hwcontext_opencl

  • Create the OpenCL context with CL_CONTEXT_INTEROP_USER_SYNC=0 on same D3D11 device.
  • Use subresource to create Y and UV images from the ID3D11Texture2D texture array with clCreateFromD3D11Texture2DKHR and the cl_intel_d3d11_nv12_media_sharing extension.

3, d3d11va hwaccel decoder

  • ID3D11VideoDecoder decode a frame as NV12 or P010 to the ID3D11Texture2D texture array

4, hwcontext_opencl

  • Accquire the image from ID3D11Texture2D texture array with clEnqueueAcquireD3D11ObjectsKHR and wait the event
  • Copy the image to host for debugging
  • Release the accquired image with clEnqueueReleaseD3D11ObjectsKHR and wait the event

5, uninit and cleanup the decoder and hwcontexts


Once I set the decoder thread count to 1 -threads 1 in FFmpeg, it gives me tearing and artifacts in the output image.

I only notice the issue on DG2 and a few Xe graphics, both are Gen12 platform with the latest driver 4032 installed.

For comparison I also tried the same CLI on the GPU from AMD and it works fine.

So I suspect there are some flaws in the Gen12 Windows driver since the cl_khr_d3d11_sharing extension claimed that the driver is responsible for providing the synchronization guarantee if I set CL_CONTEXT_INTEROP_USER_SYNC=0 on context creation.

The test video is taken from http://www.larmoire.info/jellyfish/media/jellyfish-120-mbps-4k-uhd-hevc-10bit.mkv

./ffmpeg.exe -init_hw_device d3d11va=dx -init_hw_device opencl=ocl@dx `
 -hwaccel_device dx -filter_hw_device ocl `
 -hwaccel d3d11va -hwaccel_output_format d3d11 -threads 1 `
 -c:v hevc -i "jellyfish-120-mbps-4k-uhd-hevc-10bit.mkv" -an -sn `
 -vf "hwmap=derive_device=opencl,format=opencl,hwdownload,format=p010" `
 -c:v hevc_qsv -preset veryfast -global_quality 25 -g:v 120 -y "tearing_artifacts.mp4"

You can try with our pre-built custom ffmpeg or build the ffmpeg with this patch applied to enable the MISC_SHARED flag.

Thanks in advance!

nyanmisaka avatar Jan 12 '23 16:01 nyanmisaka

Hi @nyanmisaka What event do you try to wait for? Could you provide the details?

XCRobert avatar Feb 28 '23 06:02 XCRobert

What event do you try to wait for? Could you provide the details?

Wait the event returned by clEnqueueAcquireD3D11ObjectsKHR and continue the next step. Here’s the FFmpeg code:

https://github.com/FFmpeg/FFmpeg/blob/891ed24f77da99c6d41bb7c116ba5925e3206ce2/libavutil/hwcontext_opencl.c#L2551-L2562

Can you reproduce the issue with my command on Windows using an Arc dGPU?

nyanmisaka avatar Feb 28 '23 07:02 nyanmisaka

Hi @XCRobert We found that the flushAndWait() call is unable to sync the D3D11 texture on DG2. https://github.com/intel/compute-runtime/blob/4100e1aa729f551bc3fe3df9851273ebd3abc701/opencl/source/os_interface/windows/d3d10_11_sharing_functions.cpp#L357-L363 https://github.com/intel/compute-runtime/blob/e53eae6e5f92bdf06ebd8e44fba3b426fcff341d/opencl/source/sharings/d3d/d3d_surface.cpp#L127-L132

We did an experiment, it's proved that combining flushAndWait() and the ID3D11DeviceContext_CopySubresourceRegion() call can do the trick but it results in performance loss. https://github.com/intel/cartwheel-ffmpeg/issues/243#issuecomment-1510764048

nyanmisaka avatar Apr 26 '23 14:04 nyanmisaka