mpv-prescalers
mpv-prescalers copied to clipboard
speedup nnedi3 with cooperative matrix multiplication
Vulkan 1.3.255 is released with a new vendor neutral extension VK_KHR_cooperative_matrix for tensorcore-like fast matrix multiplication, which could possibly be used to speedup nnedi3. A basic 16x8x8 fp16
coopmatMulAdd is enough. And according to some perf stats I found elsewhere, a 2x to 3x speedup could be expected.
But first, this had to be hold until AMD implemented this extension in their Linux driver (or maybe radv will overcome and implement this first?).
radv(amd): https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24683 anv(intel): https://gitlab.freedesktop.org/mesa/mesa/-/issues/9250
I only have AMD RDNA3(GFX11+) GPU for testing, and according to the RADV PR above, the supported coopMatMul
type is 16x16x16
(opcode: v_wmma_f32_16x16x16_f16
) with subgroup size of 64. This settings probably won't work on both Intel and nvidia cards.