Aditya Srichandan comments

Results 11 comments of


                                            Aditya Srichandan

[Issue]: error: invalid operand for instruction while building composable kernel

@lshqqytiger according to the documentation, for gfx1030, you shall use -D GPU_ARCHS : **_"NOTE: If you try setting GPU_TARGETS to a list of architectures, the build will only work if...

[Issue]: Very slow perf for Gemm BF16

Hi @ghostplant Thanks for reporting this. I reproduced the numbers and saw a similar gap between `ckProfiler` and `rocblas-bench` for this GEMM shape. From what I see here is, the...

[Issue]: Very slow perf for Gemm BF16

To add more kernel choices to improve `ckProfiler` performance for the shape **M=32, N=512, K=7168** (bf16, RowMajor A, ColumnMajor B): `ck/library/src/tensor_operation_instance/gpu/gemm/device_gemm_xdl_cshuffle_bf16_bf16_bf16_mk_nk_mn_instance.cpp` https://github.com/ROCm/composable_kernel/blob/e6104daecc7e29d26fc0435dd697132bdd262163/library/src/tensor_operation_instance/gpu/gemm/device_gemm_xdl_c_shuffle_bf16_bf16_bf16_mk_nk_mn_instance.cpp This function registers all supported kernel variants for...

[Issue]: Very slow perf for Gemm BF16

Hello @ghostplant let me reproduce it again, I did not have this issue earlier, are you seeing this on main develop branch?

[Issue]: Dependency on msgpack even when -DTensile_LIBRARY_FORMAT=yaml

Hi @cj401-amd could you please clarify on another issue.

[Issue]: Slow model compilation

Hello @Artoriuz,I will reproduce this issue and get back to you as soon as possible.

Megatron training adaptation issue, obtaining CUDA_SOME as None

Hi @JoJo-Lorray , The latest PyTorch logic is now correct (https://github.com/ROCm/pytorch/blob/0a6e1d6b9bf78d690a812e4334939e7701bfa794/torch/utils/cpp_extension.py#L243C1-L244C1) — it uses torch.cuda._is_compiled() to set CUDA_HOME, and no longer incorrectly relies on IS_HIP_EXTENSION. On ROCm, torch.cuda._is_compiled() returns False,...

Aditya Srichandan

[Issue]: error: invalid operand for instruction while building composable kernel

[Issue]: Very slow perf for Gemm BF16

[Issue]: Very slow perf for Gemm BF16

[Issue]: Very slow perf for Gemm BF16

[Issue]: Dependency on msgpack even when -DTensile_LIBRARY_FORMAT=yaml

[Issue]: Slow model compilation

Megatron training adaptation issue, obtaining CUDA_SOME as None

migraphx bug [vapoursynth]

migraphx bug [vapoursynth]

dockerhub images appear to be build images instead of runtime