[Issue]: Regarding the parameter issues with DeviceGemm_Xdl_CShuffle
Problem Description
// clang-format off
using DeviceGemmInstance1 = ck::tensor_operation::device::DeviceGemm_Xdl_CShuffle
<
ALayout, // ALayout
BLayout, // BLayout
CLayout, // CLayout
ADataType, // ADataType
BDataType, // BDataType
CDataType, // CDataType
AccDataType, // AccDataType
CShuffleDataType, // CShuffleDataType
AElementOp, // AElementwiseOperation
BElementOp, // BElementwiseOperation
CElementOp, // CElementwiseOperation
GemmDefault, // GEMMSpacialization
1, // NumGemmKPrefetchStage
BlockSize, // BlockSize
256, // MPerBlock
128, // NPerBlock
32, // KPerBlock
8, // AK1
2, // BK1
32, // MPerXDL
32, // NPerXDL
4, // MXdlPerWave
2, // NXdlPerWave
S<4, 64, 1>, // ABlockTransferThreadClusterLengths_K0_M_K1
S<1, 0, 2>, // ABlockTransferThreadClusterArrangeOrder
S<1, 0, 2>, // ABlockTransferSrcAccessOrder
2, // ABlockTransferSrcVectorDim
8, // ABlockTransferSrcScalarPerVector
8, // ABlockTransferDstScalarPerVector_K1
1, // ABlockLdsAddExtraM
S<8, 32, 1>, // BBlockTransferThreadClusterLengths_K0_N_K1
S<0, 2, 1>, // BBlockTransferThreadClusterArrangeOrder
S<0, 2, 1>, // BBlockTransferSrcAccessOrder
1, // BlockTransferSrcVectorDim
4, // BBlockTransferSrcScalarPerVector
2, // BBlockTransferDstScalarPerVector_K1
0, // BBlockLdsAddExtraN
1, // CShuffleMXdlPerWavePerShuffle
2, // CShuffleNXdlPerWavePerShuffle
S<1, 16, 1, 16>, // CBlockTransferClusterLengths_MBlock_MWaveMPerXdl_NBlock_NWaveNPerXdl
8, // CBlockTransferScalarPerVector_NWaveNPerXdl
ck::LoopScheduler::Interwave,
ck::PipelineVersion::v1>;
// clang-format on
Could you please explain the meaning of the above parameters, and how should I relate them to the specific dimensions of matrices A, B, and C? Also, why do I get an error when I change the value of //CBlockTransferScalarPerVector_NWaveNPerXdl? The error is as follows:
opt/rocm-6.0.0/include/ck/tensor_description/tensor_space_filling_curve.hpp:44:9: error: static assertion failed due to requirement 'const ck::Sequence<1, 4, 1, 8>{} % ck::Sequence<1, 1, 1, 16>{} == ck::Sequence<0, 0, 0, 0>{}'
static_assert(TensorLengths{} % ScalarsPerAccess{} ==
Also, which parameters are related to the tuple ck::Sequence<1, 4, 1, 8> shown in the error message? How is it calculated? Additionally, what is the reason for the following error?
/opt/rocm-6.0.0/include/ck/tensor_description/tensor_space_filling_curve.hpp:112:28: error: constexpr variable 'id' must be initialized by a constant expression
constexpr auto id = compute_index_impl(idim);
I currently do not understand the meaning of the above parameters, and I hope someone can help me analyze them. Thank you very much!
Operating System
Ubuntu 22.04.3 LTS
CPU
intel
GPU
AMD Instinct MI210
Other
No response
ROCm Version
ROCm 6.0.0
ROCm Component
Composable Kernel
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
I hope friends who understand these parameters can give me valuable advice. I've been stuck on understanding these parameters recently, and I would greatly appreciate any help!