[Issue]: Regarding the parameter issues with DeviceGemm_Xdl_CShuffle

Open xiaobo1025 opened this issue 1 year ago • 0 comments

Problem Description

// clang-format off using DeviceGemmInstance1 = ck::tensor_operation::device::DeviceGemm_Xdl_CShuffle < ALayout, // ALayout BLayout, // BLayout CLayout, // CLayout ADataType, // ADataType BDataType, // BDataType CDataType, // CDataType AccDataType, // AccDataType CShuffleDataType, // CShuffleDataType AElementOp, // AElementwiseOperation BElementOp, // BElementwiseOperation CElementOp, // CElementwiseOperation GemmDefault, // GEMMSpacialization
1, // NumGemmKPrefetchStage BlockSize, // BlockSize 256, // MPerBlock 128, // NPerBlock 32, // KPerBlock 8, // AK1 2, // BK1 32, // MPerXDL 32, // NPerXDL 4, // MXdlPerWave 2, // NXdlPerWave S<4, 64, 1>, // ABlockTransferThreadClusterLengths_K0_M_K1 S<1, 0, 2>, // ABlockTransferThreadClusterArrangeOrder S<1, 0, 2>, // ABlockTransferSrcAccessOrder
2, // ABlockTransferSrcVectorDim 8, // ABlockTransferSrcScalarPerVector 8, // ABlockTransferDstScalarPerVector_K1 1, // ABlockLdsAddExtraM S<8, 32, 1>, // BBlockTransferThreadClusterLengths_K0_N_K1 S<0, 2, 1>, // BBlockTransferThreadClusterArrangeOrder S<0, 2, 1>, // BBlockTransferSrcAccessOrder 1, // BlockTransferSrcVectorDim 4, // BBlockTransferSrcScalarPerVector 2, // BBlockTransferDstScalarPerVector_K1 0, // BBlockLdsAddExtraN 1, // CShuffleMXdlPerWavePerShuffle 2, // CShuffleNXdlPerWavePerShuffle S<1, 16, 1, 16>, // CBlockTransferClusterLengths_MBlock_MWaveMPerXdl_NBlock_NWaveNPerXdl
8, // CBlockTransferScalarPerVector_NWaveNPerXdl ck::LoopScheduler::Interwave, ck::PipelineVersion::v1>;
// clang-format on Could you please explain the meaning of the above parameters, and how should I relate them to the specific dimensions of matrices A, B, and C? Also, why do I get an error when I change the value of //CBlockTransferScalarPerVector_NWaveNPerXdl? The error is as follows: opt/rocm-6.0.0/include/ck/tensor_description/tensor_space_filling_curve.hpp:44:9: error: static assertion failed due to requirement 'const ck::Sequence<1, 4, 1, 8>{} % ck::Sequence<1, 1, 1, 16>{} == ck::Sequence<0, 0, 0, 0>{}' static_assert(TensorLengths{} % ScalarsPerAccess{} == Also, which parameters are related to the tuple ck::Sequence<1, 4, 1, 8> shown in the error message? How is it calculated? Additionally, what is the reason for the following error? /opt/rocm-6.0.0/include/ck/tensor_description/tensor_space_filling_curve.hpp:112:28: error: constexpr variable 'id' must be initialized by a constant expression constexpr auto id = compute_index_impl(idim); I currently do not understand the meaning of the above parameters, and I hope someone can help me analyze them. Thank you very much!

Operating System

Ubuntu 22.04.3 LTS

CPU

intel

GPU

AMD Instinct MI210

Other

No response

ROCm Version

ROCm 6.0.0

ROCm Component

Composable Kernel

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

I hope friends who understand these parameters can give me valuable advice. I've been stuck on understanding these parameters recently, and I would greatly appreciate any help!

Apr 12 '24 09:04 xiaobo1025