[QST] Tiled copy misaligned, how to solve it?
My code:
using GmemTiledCopyL = decltype(make_tiled_copy(
Copy_Atom<DefaultCopy, half>{}, Layout<Shape<_1>>{}, Layout<Shape<_1>>{}));
using SmemLayoutL = decltype(Layout<Shape<Int<64>>>{});
__shared__ cute::array_aligned<half, cute::cosize_v<SmemLayoutL>> l;
GmemTiledCopyL gmem_tiled_copy_L;
auto gmem_thr_copy_LD = gmem_tiled_copy_LD.get_thread_slice(tid);
Tensor _L = make_tensor(make_gmem_ptr(reinterpret_cast<const float*>(L),
make_shape(C), make_stride(Int<1>{}));
Tensor gL = local_tile(_L, make_tile(Int<64>{}),
make_coord(_)); // [64, C / 64]
Tensor sL = make_tensor(make_smem_ptr(shared_storage.l.data()),
SmemLayoutL{}); // [64]
Tensor gL_to_sL_src = gmem_thr_copy_LD.partition_S(gL);
Tensor gL_to_sL_dst = gmem_thr_copy_LD.partition_D(sL);
cute::copy(gmem_tiled_copy_L, gL_to_sL_src(_, _, 0), gL_to_sL_dst);
if my C is not divided by 8, then will throw error: CUDA get error, code:716, msg:misaligned address How to solve it? Thanks.
This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.
This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.