iree [ROCM][DT][Ukernel] Port gfx942 ukernel to gfx950 for data tiling

-- This commit ports gfx942 ukernel to gfx950 for data tiling.

Signed-off-by: Abhishek Varma [email protected]

Since we don't have mi350 CI, I'm adding the required data for adding a verification e2e to torch-models CI :-

golden_dispatch_count : 1712
golden_time : 495 ms (prefill 2048)
golden_time : 73.8 ms (prefill 128)

NOTE: I've pushed the IR added in this thread to https://sharkpublic.blob.core.windows.net/sharkpublic/iree-test-suites/torch-models/llama_8b_fp8/bs4_f8e4m3fn.mlir.

Dec 04 '25 10:12 Abhishek-Varma

Are we going to tune the tile size a bit before checking in these new ukernels?

https://github.com/iree-org/iree/issues/22121#issuecomment-3597180396

Dec 04 '25 11:12 Yu-Zhewen

Are we going to tune the tile size a bit before checking in these new ukernels?

#22121 (comment)

Yes, we should make sure that the ukernel performs better than direct codegen before checking in.

Dec 04 '25 11:12 jtuyls

Let's convert it to draft until we understand what's happening and missing in ukernel path? Note that we'll need to revisit ukernel approach that does not use MLIR ukernel path. The work of MLIR based ukernel might help flesh out what's missing in the other type of ukernels, though.

Context: https://discord.com/channels/689900678990135345/1254843174111678555/1443009208181063805

Dec 05 '25 14:12 hanhanW

Let's convert it to draft until we understand what's happening and missing in ukernel path? Note that we'll need to revisit ukernel approach that does not use MLIR ukernel path. The work of MLIR based ukernel might help flesh out what's missing in the other type of ukernels, though.

Context: https://discord.com/channels/689900678990135345/1254843174111678555/1443009208181063805

Yeah, converted it to draft now. Abhishek is out as well until the end of the month. I might have a look at this if I have some time as it's useful in any case to understand as you pointed out as well.

Dec 05 '25 15:12 jtuyls