cutlass
cutlass copied to clipboard
[BUG] Autovectorized copy fails with shape (_2, _3)
Describe the bug
Automatic vectorization of copying doesn't account for shape divisibility. When attempting to copy a tensor with the layout (_2,_3):(_1, _2), the greatest common vector length is 6. However, it's vectorized at 128 bits, which means copying four elements at once. This approach doesn't work for a tensor with a size of 6.
Steps/Code to reproduce bug
#include <vector>
#include <cute/tensor.hpp>
#include <cute/layout.hpp>
using namespace cute;
int main() {
auto mem_layout = make_layout(make_shape(Int<2>{}, Int<3>{}));
print_layout(mem_layout);
std::vector<int> src_buffer(size(mem_layout));
std::vector<int> dst_buffer(size(mem_layout));
auto src = make_tensor(src_buffer.data(), mem_layout);
for (int t = 0; t < size(mem_layout); t++) {
src[t] = t;
}
print_tensor(src);
auto dst = make_tensor(dst_buffer.data(), mem_layout);
copy(src, dst);
print_tensor(dst);
return 0;
}
I got the following error:
error: static assertion failed due to requirement 'C<3>::value % C<2>::value == 0 || C<2>::value % C<3>::value == 0': Static shape_div failure
405 | static_assert(IntTupleA::value % IntTupleB::value == 0 || IntTupleB::value % IntTupleA::value == 0, "Static shape_div failure");
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/ethan/cutlass/include/cute/layout.hpp:1680:24: note: in instantiation of function template specialization 'cute::shape_div<cute::C<3>, cute::C<2>>' requested here
1680 | return make_layout(shape_div(shape, shape_div(Int<N>{}, abs(stride))),
| ^
/home/ethan/cutlass/include/cute/layout.hpp:1676:86: note: in instantiation of function template specialization 'cute::upcast<4, cute::C<3>, cute::C<2>>' requested here
1676 | return transform_layout(shape, stride, [](auto const& s, auto const& d) { return upcast<N>(s,d); });
| ^
/home/ethan/cutlass/include/cute/layout.hpp:704:22: note: in instantiation of function template specialization 'cute::upcast(const cute::tuple<cute::C<2>, cute::C<3>> &, const cute::tuple<cute::C<1>, cute::C<2>> &)::(anonymous class)::operator()<cute::C<3>, cute::C<2>>' requested here
704 | return make_layout(f(get<I>(t0),get<I>(t1))..., get<I0>(t0)..., get<I1>(t1)...);
| ^
/home/ethan/cutlass/include/cute/layout.hpp:725:18: note: in instantiation of function template specialization 'cute::detail::transform_layout<cute::tuple<cute::C<2>, cute::C<3>>, cute::tuple<cute::C<1>, cute::C<2>>, (lambda at /home/ethan/cutlass/include/cute/layout.hpp:1676:44) &, 0, 1>' requested here
725 | return detail::transform_layout(t0, t1, f, make_seq<R>{}, make_range<R,R0>{}, make_range<R,R1>{});
| ^
/home/ethan/cutlass/include/cute/layout.hpp:1676:12: note: in instantiation of function template specialization 'cute::transform_layout<cute::tuple<cute::C<2>, cute::C<3>>, cute::tuple<cute::C<1>, cute::C<2>>, (lambda at /home/ethan/cutlass/include/cute/layout.hpp:1676:44)>' requested here
1676 | return transform_layout(shape, stride, [](auto const& s, auto const& d) { return upcast<N>(s,d); });
| ^
/home/ethan/cutlass/include/cute/layout.hpp:1696:10: note: (skipping 2 contexts in backtrace; use -ftemplate-backtrace-limit=0 to see all)
1696 | return upcast<N>(layout.shape(), layout.stride());
| ^
/home/ethan/cutlass/include/cute/tensor.hpp:658:21: note: in instantiation of function template specialization 'cute::recast_layout<int, const cutlass::uint128_t, cute::tuple<cute::C<2>, cute::C<3>>, cute::tuple<cute::C<1>, cute::C<2>>>' requested here
658 | auto new_layout = recast_layout<OldType,NewType>(old_layout);
| ^
/home/ethan/cutlass/include/cute/algorithm/copy.hpp:210:20: note: in instantiation of function template specialization 'cute::recast<const cutlass::uint128_t, const cute::Tensor<cute::ViewEngine<int *>, cute::Layout<cute::tuple<cute::C<2>, cute::C<3>>, cute::tuple<cute::C<1>, cute::C<2>>>> &>' requested here
210 | Tensor src_v = recast<SrcVecType>(src);
| ^
/home/ethan/cutlass/include/cute/algorithm/copy.hpp:283:12: note: in instantiation of function template specialization 'cute::copy_vec<cutlass::uint128_t, cute::ViewEngine<int *>, cute::Layout<cute::tuple<cute::C<2>, cute::C<3>>, cute::tuple<cute::C<1>, cute::C<2>>>, cute::ViewEngine<int *>, cute::Layout<cute::tuple<cute::C<2>, cute::C<3>>, cute::tuple<cute::C<1>, cute::C<2>>>>' requested here
283 | return copy_vec<uint_bit_t<vec_bits>>(src, dst);
| ^
/home/ethan/cutlass/include/cute/algorithm/copy.hpp:297:10: note: in instantiation of function template specialization 'cute::copy<8, cute::ViewEngine<int *>, cute::Layout<cute::tuple<cute::C<2>, cute::C<3>>, cute::tuple<cute::C<1>, cute::C<2>>>, cute::ViewEngine<int *>, cute::Layout<cute::tuple<cute::C<2>, cute::C<3>>, cute::tuple<cute::C<1>, cute::C<2>>>>' requested here
297 | return copy(AutoVectorizingCopy{}, src, dst);
| ^
test.cpp:24:5: note: in instantiation of function template specialization 'cute::copy<cute::ViewEngine<int *>, cute::Layout<cute::tuple<cute::C<2>, cute::C<3>>, cute::tuple<cute::C<1>, cute::C<2>>>, cute::ViewEngine<int *>, cute::Layout<cute::tuple<cute::C<2>, cute::C<3>>, cute::tuple<cute::C<1>, cute::C<2>>>>' requested here
24 | copy(src, dst);
| ^
Expected behavior
copy(src, dst) should just work. Internally, it should be able to compute a correct predicate and use that to do the copy.
Environment details (please complete the following information):
- Environment location: [Bare-metal]
We have found similar issues internally with "domain alignment" of copies. This is being worked on right now
This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.
Still needs to be addressed.
will be fixed in 3.5.1. ETA is hard to predict but hopefully next couple of weeks
@YichengDWu please verify and close.
I can't verify it now since I no longer have an NVIDIA card. Thank you for fixing this!