compute-runtime
compute-runtime copied to clipboard
cl_khr_d3d11_sharing causes tearing and artifacts on DG2
Hello there! I got some tearing and artifacts when sharing textures between the D3D11 and OpenCL.

Here's the main procedures to decode and share a video frame in FFmpeg:
1, hwcontext_d3d11va
- Create D3D11 device on DG2, set
ID3D10Multithread_SetMultithreadProtectedto true. - Create ID3D11Texture2D texture array with
D3D11_RESOURCE_MISC_SHARED.
2, hwcontext_opencl
- Create the OpenCL context with
CL_CONTEXT_INTEROP_USER_SYNC=0on same D3D11 device. - Use
subresourceto create Y and UV images from the ID3D11Texture2D texture array withclCreateFromD3D11Texture2DKHRand thecl_intel_d3d11_nv12_media_sharingextension.
3, d3d11va hwaccel decoder
- ID3D11VideoDecoder decode a frame as NV12 or P010 to the ID3D11Texture2D texture array
4, hwcontext_opencl
- Accquire the image from ID3D11Texture2D texture array with
clEnqueueAcquireD3D11ObjectsKHRand wait the event - Copy the image to host for debugging
- Release the accquired image with
clEnqueueReleaseD3D11ObjectsKHRand wait the event
5, uninit and cleanup the decoder and hwcontexts
Once I set the decoder thread count to 1 -threads 1 in FFmpeg, it gives me tearing and artifacts in the output image.
I only notice the issue on DG2 and a few Xe graphics, both are Gen12 platform with the latest driver 4032 installed.
For comparison I also tried the same CLI on the GPU from AMD and it works fine.
So I suspect there are some flaws in the Gen12 Windows driver since the cl_khr_d3d11_sharing extension claimed that the driver is responsible for providing the synchronization guarantee if I set CL_CONTEXT_INTEROP_USER_SYNC=0 on context creation.
The test video is taken from http://www.larmoire.info/jellyfish/media/jellyfish-120-mbps-4k-uhd-hevc-10bit.mkv
./ffmpeg.exe -init_hw_device d3d11va=dx -init_hw_device opencl=ocl@dx `
-hwaccel_device dx -filter_hw_device ocl `
-hwaccel d3d11va -hwaccel_output_format d3d11 -threads 1 `
-c:v hevc -i "jellyfish-120-mbps-4k-uhd-hevc-10bit.mkv" -an -sn `
-vf "hwmap=derive_device=opencl,format=opencl,hwdownload,format=p010" `
-c:v hevc_qsv -preset veryfast -global_quality 25 -g:v 120 -y "tearing_artifacts.mp4"
You can try with our pre-built custom ffmpeg or build the ffmpeg with this patch applied to enable the MISC_SHARED flag.
Thanks in advance!
Hi @nyanmisaka What event do you try to wait for? Could you provide the details?
What event do you try to wait for? Could you provide the details?
Wait the event returned by clEnqueueAcquireD3D11ObjectsKHR and continue the next step. Here’s the FFmpeg code:
https://github.com/FFmpeg/FFmpeg/blob/891ed24f77da99c6d41bb7c116ba5925e3206ce2/libavutil/hwcontext_opencl.c#L2551-L2562
Can you reproduce the issue with my command on Windows using an Arc dGPU?
Hi @XCRobert We found that the flushAndWait() call is unable to sync the D3D11 texture on DG2.
https://github.com/intel/compute-runtime/blob/4100e1aa729f551bc3fe3df9851273ebd3abc701/opencl/source/os_interface/windows/d3d10_11_sharing_functions.cpp#L357-L363
https://github.com/intel/compute-runtime/blob/e53eae6e5f92bdf06ebd8e44fba3b426fcff341d/opencl/source/sharings/d3d/d3d_surface.cpp#L127-L132
We did an experiment, it's proved that combining flushAndWait() and the ID3D11DeviceContext_CopySubresourceRegion() call can do the trick but it results in performance loss. https://github.com/intel/cartwheel-ffmpeg/issues/243#issuecomment-1510764048