[ROCM][DT][Ukernel] Port gfx942 ukernel to gfx950 for data tiling
-- This commit ports gfx942 ukernel to gfx950 for data tiling.
Signed-off-by: Abhishek Varma [email protected]
Since we don't have mi350 CI, I'm adding the required data for adding a verification e2e to torch-models CI :-
golden_dispatch_count: 1712golden_time: 495 ms (prefill 2048)golden_time: 73.8 ms (prefill 128)
NOTE: I've pushed the IR added in this thread to https://sharkpublic.blob.core.windows.net/sharkpublic/iree-test-suites/torch-models/llama_8b_fp8/bs4_f8e4m3fn.mlir.
Are we going to tune the tile size a bit before checking in these new ukernels?
https://github.com/iree-org/iree/issues/22121#issuecomment-3597180396
Are we going to tune the tile size a bit before checking in these new ukernels?
Yes, we should make sure that the ukernel performs better than direct codegen before checking in.
Let's convert it to draft until we understand what's happening and missing in ukernel path? Note that we'll need to revisit ukernel approach that does not use MLIR ukernel path. The work of MLIR based ukernel might help flesh out what's missing in the other type of ukernels, though.
Context: https://discord.com/channels/689900678990135345/1254843174111678555/1443009208181063805
Let's convert it to draft until we understand what's happening and missing in ukernel path? Note that we'll need to revisit ukernel approach that does not use MLIR ukernel path. The work of MLIR based ukernel might help flesh out what's missing in the other type of ukernels, though.
Context: https://discord.com/channels/689900678990135345/1254843174111678555/1443009208181063805
Yeah, converted it to draft now. Abhishek is out as well until the end of the month. I might have a look at this if I have some time as it's useful in any case to understand as you pointed out as well.