Does DeviceGemmMultipleABD_Xdl-CShuffle support the layout of B as row?
I referred to the layout of other B to modify the parameters of the row template. The compilation was good, but the calculation result was incorrect.When retrieving multiple data at once, an error occurs. Here are two parameters taken at a time.
The template is as follows:
using ALayout = Row; using BLayout = Row; using DLayout = Row; using ELayout = Row;
using DeviceOpInstance = ck::tensor_operation::device::DeviceGemmMultipleABD_Xdl_CShuffle<
ck::Tuple<ALayout>,
ck::Tuple<BLayout>,
ck::Tuple<>,
ELayout,
ck::Tuple<ADataType>,
ck::Tuple<BDataType>,
AccDataType,
CShuffleDataType,
ck::Tuple<>,
EDataType,
AElementOp,
BElementOp,
CDEElementOp,
GemmSpec,
1,
256,
256,
128,
32,
8,
8,
32,
32,
4,
2,
S<4, 64, 1>,
S<1, 0, 2>,
S<1, 0, 2>,
2,
8,
8,
1,
S<4, 64, 1>,
S<1, 0, 2>,
S<1, 0, 2>,
1,
2,
2,
1,
1,
1,
S<1, 32, 1, 8>,
8>;
@zjing14 Can you help me answer this?
@xiabo123 What the gemm case you are running?
@zjing14 Thank you for your answer.gemm case :60_gemm_multi_ABD.I referred to the modifications in your PR (https://github.com/ROCm/composable_kernel/pull/978)and added the function TransposeFromElmToDst to implement it. The layout of B is row, which takes multiple operations at once.
@xiabo123 Do you mean the PR #978 can resolve your issue?
@zjing14 yes,However, when B takes 8 at a time, the performance will be very poor.Change 1, 2, 8 here to 1, 8, 8.
@xiabo123 Since PR #978 has now been merged, would like mind letting us know if the issue has been resolved? Thanks!
Hi @xiabo123. We will be closing the issue for now since there seems to be no obvious actionable items. Thanks!