mpv-prescalers speedup nnedi3 with cooperative matrix multiplication

speedup nnedi3 with cooperative matrix multiplication

Open bjin opened this issue 1 year ago • 1 comments

Vulkan 1.3.255 is released with a new vendor neutral extension VK_KHR_cooperative_matrix for tensorcore-like fast matrix multiplication, which could possibly be used to speedup nnedi3. A basic 16x8x8 fp16 coopmatMulAdd is enough. And according to some perf stats I found elsewhere, a 2x to 3x speedup could be expected.

But first, this had to be hold until AMD implemented this extension in their Linux driver (or maybe radv will overcome and implement this first?).

Aug 11 '23 16:08 bjin

radv(amd): https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24683 anv(intel): https://gitlab.freedesktop.org/mesa/mesa/-/issues/9250

I only have AMD RDNA3(GFX11+) GPU for testing, and according to the RADV PR above, the supported coopMatMul type is 16x16x16 (opcode: v_wmma_f32_16x16x16_f16) with subgroup size of 64. This settings probably won't work on both Intel and nvidia cards.

Sep 12 '23 04:09 bjin

mpv-prescalers mpv-prescalers copied to clipboard

speedup nnedi3 with cooperative matrix multiplication

mpv-prescalers
mpv-prescalers copied to clipboard