Will not compile on GCC 11.1.0, CUDA 11.3
I've been trying to build this package for the last two nights to no avail. Every time I run python setup.py install, I get a big wall of compiler warnings indicating that it's not allowed to call __host__ functions from __host__ __device__ functions, followed by a few errors:
/home/nora/Code/cule/third_party/agency/agency/cuda/execution/execution_policy/grid_execution_policy.hpp:35:100: required from here
/home/nora/Code/cule/third_party/agency/agency/detail/operator_traits.hpp:92:88: error: no match for ‘operator*’ (operand types are ‘agency::point<unsigned int, 2>’ and ‘unsigned int’)
92 | struct has_operator_multiplies
| ^
/home/nora/Code/cule/third_party/agency/agency/detail/tuple/arithmetic_tuple_facade.hpp:278:1: note: candidate: ‘template<class ArithmeticTuple, class> Derived agency::detail::arithmetic_tuple_facade<Derived>::operator*(const ArithmeticTuple&) const [with ArithmeticTuple = ArithmeticTuple; <template-parameter-2-2> = <template-parameter-1-2>; Derived = agency::point<unsigned int, 2>]’
278 | Derived operator*(const ArithmeticTuple& rhs) const
| ^ ~~~~~~
/home/nora/Code/cule/third_party/agency/agency/detail/tuple/arithmetic_tuple_facade.hpp:278:1: note: template argument deduction/substitution failed:
/home/nora/Code/cule/third_party/agency/agency/detail/tuple/arithmetic_tuple_facade.hpp:158:63: error: incomplete type ‘std::tuple_size<unsigned int>’ used in nested name specifier
158 | class = typename std::enable_if<
| ^
/home/nora/Code/cule/third_party/agency/agency/coordinate/point.hpp:197:1: note: candidate: ‘template<class T1, class T2, long unsigned int Rank> typename std::enable_if<(std::is_arithmetic<_Tp>::value && agency::detail::has_operator_multiplies<T1, T2>::value), agency::point<T, Rank> >::type agency::operator*(T1, const agency::point<T, Rank>&)’
197 | operator*(T1 val, const point<T2,Rank>& p)
| ^ ~~~~~~
/home/nora/Code/cule/third_party/agency/agency/coordinate/point.hpp:197:1: note: template argument deduction/substitution failed:
/home/nora/Code/cule/third_party/agency/agency/detail/operator_traits.hpp:92:88: note: mismatched types ‘const agency::point<T, Rank>’ and ‘unsigned int’
92 | struct has_operator_multiplies
| ^
GCC indicates that this invalid template instantiation is required from torchcule/backend.cu:44:21, although the chain of dependencies linking that line of code to the final error is way too long and complex for me to understand. I've attached the entire stderr & stdout output from the compiler to this post. Any help toward solving this issue would be greatly appreciated.
compile-errors.txt
@sdalton1, @ifrosio do you have any solution to this issue? And in general how to run CuLE with the latest PyTorch?
Hi there-I got it to work on my laptop (GTX 1650 Ti, CUDA 11.3, PyTorch 1.11.0) by fixing the following lines in setup.py:
codes = [arch[-2:] for arch in gpus]
arch_gencode = ['-arch=sm_' + codes[0]] + ['-gencode=arch=compute_{0},code=sm_{0}'.format(code) for code in codes]
You might also want to run it with python setup.py install --fastbuild to reduce the build time.
@ifrosio, @sdalton1 any updates on the issue? On how to build and run CuLE on Amper GPUs?
I got more errors:
/usr/local/cuda/bin/nvcc -I/home/denys/Documents/git/ml/cule -I/home/denys/Documents/git/ml/cule/third_party/agency -I/home/denys/Documents/git/ml/cule/third_party/pybind11/include -I/usr/local/cuda/include -I/home/denys/anaconda3/envs/rlgpu/include/python3.7m -c torchcule/backend.cu -o build/temp.linux-x86_64-cpython-37/torchcule/backend.o -arch=sm_70 -gencode=arch=compute_70,code=sm_70 -O3 -Xptxas=-v -Xcompiler=-Wall,-Wextra,-fPIC -allow-unsupported-compiler -ccbin=gcc
/usr/include/stdio.h(189): error: attribute "__malloc__" does not take arguments
/usr/include/stdio.h(201): error: attribute "__malloc__" does not take arguments
/usr/include/stdio.h(223): error: attribute "__malloc__" does not take arguments
I can't reproduce this error on my machine. I am compiling on Ubuntu 20.04.4, torch 1.12.0, gcc 9.4.0 and the cule main branch. I tried recompiling using an older version the cuda toolkit, version 11.3, from the dockerfile but that also worked on my machine. If anyone has a Dockerfile to generate the failure with the configured software that would help a lot.
Thanks @sdalton1 I just installed the latest ubuntu ( 22.04 I think). Looks like it is related to the wrong gcc version. Will try to solve it using this link: https://linuxconfig.org/how-to-switch-between-multiple-gcc-and-g-compiler-versions-on-ubuntu-20-04-lts-focal-fossa