MatX
MatX copied to clipboard
[BUG] A simple `matx::sum` test case failed to compile since this commit
Describe the bug
This simple matx::sum
test case failed to compile on the latest commit.
#include <matx.h>
int main() {
auto t1 = matx::make_tensor<float, 2>({32, 10});
auto t2 = matx::make_tensor<float, 1>({32});
matx::sum(t2, t1, 0);
cudaDeviceSynchronize();
return 0;
}
After binary-searching the commits, I found that this will happen since commit 77a0d4c6b04fcc8027563127500d4735e99c3cb7.
To Reproduce Steps to reproduce the behavior:
- Try to compile the test code above with MatX after commit 77a0d4c6b04fcc8027563127500d4735e99c3cb7.
- Output from compiler:
Consolidate compiler generated dependencies of target test
[ 50%] Building CUDA object CMakeFiles/test.dir/main.cu.o
/root/gs/MatX-Install-Home/include/matx_cub.h(724): error: no instance of function template "cub::DeviceSegmentedReduce::Sum" matches the argument list
argument types are: (float *, size_t, float *, float *, int, matx::detail::BeginOffset<matx::tensor_t<float, 2, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<2>>>, matx::detail::EndOffset<matx::tensor_t<float, 2, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<2>>>, const cudaStream_t)
detected during:
instantiation of "void matx::detail::matxCubPlan_t<OutputTensor, InputOperator, op, CParams>::ExecSum(OutputTensor &, const InputOperator &, cudaStream_t) [with OutputTensor=matx::tensor_t<float, 1, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<1>>, InputOperator=matx::tensor_t<float, 2, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<2>>, op=(matx::detail::CUBOperation_t)4, CParams=matx::detail::EmptyParams_t]"
(259): here
instantiation of "matx::detail::matxCubPlan_t<OutputTensor, InputOperator, op, CParams>::matxCubPlan_t(OutputTensor &, const InputOperator &, const CParams &, cudaStream_t) [with OutputTensor=matx::tensor_t<float, 1, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<1>>, InputOperator=matx::tensor_t<float, 2, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<2>>, op=(matx::detail::CUBOperation_t)4, CParams=matx::detail::EmptyParams_t]"
(1249): here
instantiation of "void matx::cub_sum(OutputTensor &, const InputOperator &, cudaStream_t) [with OutputTensor=matx::tensor_t<float, 1, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<1>>, InputOperator=matx::tensor_t<float, 2, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<2>>]"
/root/gs/MatX-Install-Home/include/matx_reduce.h(1225): here
instantiation of "void matx::sum(TensorType &, const InType &, cudaStream_t) [with TensorType=matx::tensor_t<float, 1, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<1>>, InType=matx::tensor_t<float, 2, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<2>>]"
/root/gs/MatX-Test/main.cu(11): here
/root/gs/MatX-Install-Home/include/matx_cub.h(733): error: no instance of function template "cub::DeviceSegmentedReduce::Sum" matches the argument list
argument types are: (float *, size_t, matx::RandomOperatorIterator<matx::detail::tensor_impl_t<float, 2, matx::DefaultDescriptor<2>>>, float *, int, matx::detail::BeginOffset<matx::tensor_t<float, 2, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<2>>>, matx::detail::EndOffset<matx::tensor_t<float, 2, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<2>>>, const cudaStream_t)
detected during:
instantiation of "void matx::detail::matxCubPlan_t<OutputTensor, InputOperator, op, CParams>::ExecSum(OutputTensor &, const InputOperator &, cudaStream_t) [with OutputTensor=matx::tensor_t<float, 1, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<1>>, InputOperator=matx::tensor_t<float, 2, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<2>>, op=(matx::detail::CUBOperation_t)4, CParams=matx::detail::EmptyParams_t]"
(259): here
instantiation of "matx::detail::matxCubPlan_t<OutputTensor, InputOperator, op, CParams>::matxCubPlan_t(OutputTensor &, const InputOperator &, const CParams &, cudaStream_t) [with OutputTensor=matx::tensor_t<float, 1, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<1>>, InputOperator=matx::tensor_t<float, 2, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<2>>, op=(matx::detail::CUBOperation_t)4, CParams=matx::detail::EmptyParams_t]"
(1249): here
instantiation of "void matx::cub_sum(OutputTensor &, const InputOperator &, cudaStream_t) [with OutputTensor=matx::tensor_t<float, 1, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<1>>, InputOperator=matx::tensor_t<float, 2, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<2>>]"
/root/gs/MatX-Install-Home/include/matx_reduce.h(1225): here
instantiation of "void matx::sum(TensorType &, const InType &, cudaStream_t) [with TensorType=matx::tensor_t<float, 1, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<1>>, InType=matx::tensor_t<float, 2, matx::basic_storage<matx::raw_pointer_buffer<float, matx::owning, matx::matx_allocator<float>>>, matx::DefaultDescriptor<2>>]"
/root/gs/MatX-Test/main.cu(11): here
2 errors detected in the compilation of "/root/gs/MatX-Test/main.cu".
make[3]: *** [CMakeFiles/test.dir/build.make:76: CMakeFiles/test.dir/main.cu.o] Error 1
make[2]: *** [CMakeFiles/Makefile2:83: CMakeFiles/test.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:90: CMakeFiles/test.dir/rule] Error 2
make: *** [Makefile:124: test] Error 2
Expected behavior This test case should compile.
System details (please complete the following information):
- OS: Ubuntu 20.04
- CUDA version: CUDA 11.4
- g++ version: 9.3.0
Hi @ZJUGuoShuai, have you tried with the latest commit? I tried compiling your code with flags similar to what we use to compile our tests and I do not get a compiler error:
I have no name!@7e2836ef9f2b:/repro/MatX/build/test$ cd /repro/MatX/build/test && /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DINDEX_64_BIT -DMATX_ENABLE_CUTENSOR=0 -DMATX_ENABLE_CUTLASS=0 -DMATX_ENABLE_FILEIO=1 -DMATX_ENABLE_PYBIND11=1 -DMATX_ENABLE_VIZ=0 -Dmatx_test_EXPORTS -I/repro/MatX/include -I/repro/MatX/include/kernels -I/repro/MatX/test/include -I/repro/MatX/examples -isystem=/repro/MatX/build/_deps/pybind11-src/include -isystem=/usr/local/bin/miniconda/include/python3.8 -isystem=/usr/local/cuda/include -isystem=/repro/MatX/build/_deps/gtest-src/googletest/include -isystem=/repro/MatX/build/_deps/gtest-src/googletest -g --generate-code=arch=compute_80,code=[sm_80] -Wall -Wextra -Werror all-warnings -Wcast-align -Wunused -Wconversion -Wno-unknown-pragmas -Wnon-virtual-dtor -Wshadow -Wmisleading-indentation -Wduplicated-cond -Wduplicated-branches -Wlogical-op -Wnull-dereference --threads 0 -g -lineinfo --expt-relaxed-constexpr -isystem=/usr/local/cuda/include -DMATX_ROOT=\"/repro/MatX\" -fvisibility=hidden -std=c++17 -MD -MT test/CMakeFiles/matx_test.dir/main.cu.o -MF CMakeFiles/matx_test.dir/test.cu.o.d -x cu -c ../test.cu
I have no name!@7e2836ef9f2b:/repro/MatX/build/test$
Can you paste how you were compiling this file?
Seems like it's caused by a feature introduced in CUB 1.13 and the commit 77a0d4c6b04fcc8027563127500d4735e99c3cb7 just took advantage of this feature.
CUB 1.13 is included in CUDA 11.5, however, I'm using CUDA 11.4, which has CUB 1.12 inside. I checked the header cub/device/device_segmented_reduce.cuh
of CUB 1.12, it only supports the same type for begin_offsets
and end_offsets
. That's why the compiler complained about "no instance of function template" when these arguments has different types.
thanks @ZJUGuoShuai, we'll get back to you on what we can do. one option is to pull in the new cub automatically since we already do that for another reason.
@ZJUGuoShuai would it be difficult for you to update to 11.5? 11.4 has a known compiler bug that makes us maintain a whole bunch of redundant code. if we increased a minimum compatible version, we can clean up the code and this issue would go away.
@cliffburdick I'm willing to upgrade my CUDA version 😃. I'll try that later.
@cliffburdick I upgraded my CUDA to 11.7.1, everything works great! I'm closing this issue. Maybe it's necessary to add in README that CUB should be >= 1.13 (or CUDA >= 11.5).