Aditya Srichandan
Aditya Srichandan
@lshqqytiger according to the documentation, for gfx1030, you shall use -D GPU_ARCHS : **_"NOTE: If you try setting GPU_TARGETS to a list of architectures, the build will only work if...
Hi @ghostplant Thanks for reporting this. I reproduced the numbers and saw a similar gap between `ckProfiler` and `rocblas-bench` for this GEMM shape. From what I see here is, the...
To add more kernel choices to improve `ckProfiler` performance for the shape **M=32, N=512, K=7168** (bf16, RowMajor A, ColumnMajor B): `ck/library/src/tensor_operation_instance/gpu/gemm/device_gemm_xdl_cshuffle_bf16_bf16_bf16_mk_nk_mn_instance.cpp` https://github.com/ROCm/composable_kernel/blob/e6104daecc7e29d26fc0435dd697132bdd262163/library/src/tensor_operation_instance/gpu/gemm/device_gemm_xdl_c_shuffle_bf16_bf16_bf16_mk_nk_mn_instance.cpp This function registers all supported kernel variants for...
Hello @ghostplant let me reproduce it again, I did not have this issue earlier, are you seeing this on main develop branch?
Hi @cj401-amd could you please clarify on another issue.
Hello @Artoriuz,I will reproduce this issue and get back to you as soon as possible.
Hi @JoJo-Lorray , The latest PyTorch logic is now correct (https://github.com/ROCm/pytorch/blob/0a6e1d6b9bf78d690a812e4334939e7701bfa794/torch/utils/cpp_extension.py#L243C1-L244C1) — it uses torch.cuda._is_compiled() to set CUDA_HOME, and no longer incorrectly relies on IS_HIP_EXTENSION. On ROCm, torch.cuda._is_compiled() returns False,...
Hi @ryou128hr To help us reproduce and debug this properly, could you please share the following? 1. The `2x_Compact_fp32_op17.onnx` model used during compilation. 2. A short sample video clip (or...
Hi @ryou128hr, just following up. Since we haven't heard back, I'll go ahead and close this issue for now. If you're still encountering the problem or have more details to...
This is because it includes full build tools, ROCm SDK, source code, and testing artifacts. It's clearly a builder image, not a runtime image. If you just need to run...