MatX [BUG] A simple `matx::sum` test case failed to compile since this commit

Describe the bug This simple matx::sum test case failed to compile on the latest commit.

#include <matx.h>

int main() {
    auto t1 = matx::make_tensor<float, 2>({32, 10});
    auto t2 = matx::make_tensor<float, 1>({32});

    matx::sum(t2, t1, 0);

    cudaDeviceSynchronize();

    return 0;
}

After binary-searching the commits, I found that this will happen since commit 77a0d4c6b04fcc8027563127500d4735e99c3cb7.

To Reproduce Steps to reproduce the behavior:

Try to compile the test code above with MatX after commit 77a0d4c6b04fcc8027563127500d4735e99c3cb7.
Output from compiler:

Consolidate compiler generated dependencies of target test
[ 50%] Building CUDA object CMakeFiles/test.dir/main.cu.o
/root/gs/MatX-Install-Home/include/matx_cub.h(724): error: no instance of function template "cub::DeviceSegmentedReduce::Sum" matches the argument list
            argument types are: (float *, size_t, float *, float *, int, matx::detail::BeginOffset<matx::tensor_t<float, 2, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<2>>>, matx::detail::EndOffset<matx::tensor_t<float, 2, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<2>>>, const cudaStream_t)
          detected during:
            instantiation of "void matx::detail::matxCubPlan_t<OutputTensor, InputOperator, op, CParams>::ExecSum(OutputTensor &, const InputOperator &, cudaStream_t) [with OutputTensor=matx::tensor_t<float, 1, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<1>>, InputOperator=matx::tensor_t<float, 2, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<2>>, op=(matx::detail::CUBOperation_t)4, CParams=matx::detail::EmptyParams_t]" 
(259): here
            instantiation of "matx::detail::matxCubPlan_t<OutputTensor, InputOperator, op, CParams>::matxCubPlan_t(OutputTensor &, const InputOperator &, const CParams &, cudaStream_t) [with OutputTensor=matx::tensor_t<float, 1, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<1>>, InputOperator=matx::tensor_t<float, 2, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<2>>, op=(matx::detail::CUBOperation_t)4, CParams=matx::detail::EmptyParams_t]" 
(1249): here
            instantiation of "void matx::cub_sum(OutputTensor &, const InputOperator &, cudaStream_t) [with OutputTensor=matx::tensor_t<float, 1, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<1>>, InputOperator=matx::tensor_t<float, 2, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<2>>]" 
/root/gs/MatX-Install-Home/include/matx_reduce.h(1225): here
            instantiation of "void matx::sum(TensorType &, const InType &, cudaStream_t) [with TensorType=matx::tensor_t<float, 1, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<1>>, InType=matx::tensor_t<float, 2, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<2>>]" 
/root/gs/MatX-Test/main.cu(11): here

/root/gs/MatX-Install-Home/include/matx_cub.h(733): error: no instance of function template "cub::DeviceSegmentedReduce::Sum" matches the argument list
            argument types are: (float *, size_t, matx::RandomOperatorIterator<matx::detail::tensor_impl_t<float, 2, matx::DefaultDescriptor<2>>>, float *, int, matx::detail::BeginOffset<matx::tensor_t<float, 2, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<2>>>, matx::detail::EndOffset<matx::tensor_t<float, 2, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<2>>>, const cudaStream_t)
          detected during:
            instantiation of "void matx::detail::matxCubPlan_t<OutputTensor, InputOperator, op, CParams>::ExecSum(OutputTensor &, const InputOperator &, cudaStream_t) [with OutputTensor=matx::tensor_t<float, 1, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<1>>, InputOperator=matx::tensor_t<float, 2, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<2>>, op=(matx::detail::CUBOperation_t)4, CParams=matx::detail::EmptyParams_t]" 
(259): here
            instantiation of "matx::detail::matxCubPlan_t<OutputTensor, InputOperator, op, CParams>::matxCubPlan_t(OutputTensor &, const InputOperator &, const CParams &, cudaStream_t) [with OutputTensor=matx::tensor_t<float, 1, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<1>>, InputOperator=matx::tensor_t<float, 2, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<2>>, op=(matx::detail::CUBOperation_t)4, CParams=matx::detail::EmptyParams_t]" 
(1249): here
            instantiation of "void matx::cub_sum(OutputTensor &, const InputOperator &, cudaStream_t) [with OutputTensor=matx::tensor_t<float, 1, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<1>>, InputOperator=matx::tensor_t<float, 2, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<2>>]" 
/root/gs/MatX-Install-Home/include/matx_reduce.h(1225): here
            instantiation of "void matx::sum(TensorType &, const InType &, cudaStream_t) [with TensorType=matx::tensor_t<float, 1, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<1>>, InType=matx::tensor_t<float, 2, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<2>>]" 
/root/gs/MatX-Test/main.cu(11): here

2 errors detected in the compilation of "/root/gs/MatX-Test/main.cu".
make[3]: *** [CMakeFiles/test.dir/build.make:76: CMakeFiles/test.dir/main.cu.o] Error 1
make[2]: *** [CMakeFiles/Makefile2:83: CMakeFiles/test.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:90: CMakeFiles/test.dir/rule] Error 2
make: *** [Makefile:124: test] Error 2

Expected behavior This test case should compile.

System details (please complete the following information):

OS: Ubuntu 20.04
CUDA version: CUDA 11.4
g++ version: 9.3.0

Aug 11 '22 03:08 AtomicVar

Hi @ZJUGuoShuai, have you tried with the latest commit? I tried compiling your code with flags similar to what we use to compile our tests and I do not get a compiler error:

I have no name!@7e2836ef9f2b:/repro/MatX/build/test$ cd /repro/MatX/build/test && /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DINDEX_64_BIT -DMATX_ENABLE_CUTENSOR=0 -DMATX_ENABLE_CUTLASS=0 -DMATX_ENABLE_FILEIO=1 -DMATX_ENABLE_PYBIND11=1 -DMATX_ENABLE_VIZ=0 -Dmatx_test_EXPORTS -I/repro/MatX/include -I/repro/MatX/include/kernels -I/repro/MatX/test/include -I/repro/MatX/examples -isystem=/repro/MatX/build/_deps/pybind11-src/include -isystem=/usr/local/bin/miniconda/include/python3.8 -isystem=/usr/local/cuda/include -isystem=/repro/MatX/build/_deps/gtest-src/googletest/include -isystem=/repro/MatX/build/_deps/gtest-src/googletest -g --generate-code=arch=compute_80,code=[sm_80] -Wall -Wextra -Werror all-warnings -Wcast-align -Wunused -Wconversion -Wno-unknown-pragmas -Wnon-virtual-dtor -Wshadow -Wmisleading-indentation -Wduplicated-cond -Wduplicated-branches -Wlogical-op -Wnull-dereference --threads 0 -g -lineinfo --expt-relaxed-constexpr -isystem=/usr/local/cuda/include -DMATX_ROOT=\"/repro/MatX\" -fvisibility=hidden -std=c++17 -MD -MT test/CMakeFiles/matx_test.dir/main.cu.o -MF CMakeFiles/matx_test.dir/test.cu.o.d -x cu -c ../test.cu
I have no name!@7e2836ef9f2b:/repro/MatX/build/test$

Can you paste how you were compiling this file?

Aug 12 '22 15:08 cliffburdick

Seems like it's caused by a feature introduced in CUB 1.13 and the commit 77a0d4c6b04fcc8027563127500d4735e99c3cb7 just took advantage of this feature.

CUB 1.13 is included in CUDA 11.5, however, I'm using CUDA 11.4, which has CUB 1.12 inside. I checked the header cub/device/device_segmented_reduce.cuh of CUB 1.12, it only supports the same type for begin_offsets and end_offsets. That's why the compiler complained about "no instance of function template" when these arguments has different types.

Aug 13 '22 17:08 AtomicVar

thanks @ZJUGuoShuai, we'll get back to you on what we can do. one option is to pull in the new cub automatically since we already do that for another reason.

Aug 13 '22 17:08 cliffburdick

@ZJUGuoShuai would it be difficult for you to update to 11.5? 11.4 has a known compiler bug that makes us maintain a whole bunch of redundant code. if we increased a minimum compatible version, we can clean up the code and this issue would go away.

Aug 13 '22 17:08 cliffburdick

@cliffburdick I'm willing to upgrade my CUDA version 😃. I'll try that later.

Aug 13 '22 17:08 AtomicVar

@cliffburdick I upgraded my CUDA to 11.7.1, everything works great! I'm closing this issue. Maybe it's necessary to add in README that CUB should be >= 1.13 (or CUDA >= 11.5).

Aug 14 '22 13:08 AtomicVar

MatX MatX copied to clipboard

[BUG] A simple `matx::sum` test case failed to compile since this commit

MatX
MatX copied to clipboard