cule Will not compile on GCC 11.1.0, CUDA 11.3

I've been trying to build this package for the last two nights to no avail. Every time I run python setup.py install, I get a big wall of compiler warnings indicating that it's not allowed to call __host__ functions from __host__ __device__ functions, followed by a few errors:

/home/nora/Code/cule/third_party/agency/agency/cuda/execution/execution_policy/grid_execution_policy.hpp:35:100:   required from here
/home/nora/Code/cule/third_party/agency/agency/detail/operator_traits.hpp:92:88: error: no match for ‘operator*’ (operand types are ‘agency::point<unsigned int, 2>’ and ‘unsigned int’)
   92 | struct has_operator_multiplies
      |                                                                                        ^                      
/home/nora/Code/cule/third_party/agency/agency/detail/tuple/arithmetic_tuple_facade.hpp:278:1: note: candidate: ‘template<class ArithmeticTuple, class> Derived agency::detail::arithmetic_tuple_facade<Derived>::operator*(const ArithmeticTuple&) const [with ArithmeticTuple = ArithmeticTuple; <template-parameter-2-2> = <template-parameter-1-2>; Derived = agency::point<unsigned int, 2>]’
  278 |   Derived operator*(const ArithmeticTuple& rhs) const
      | ^ ~~~~~~
/home/nora/Code/cule/third_party/agency/agency/detail/tuple/arithmetic_tuple_facade.hpp:278:1: note:   template argument deduction/substitution failed:
/home/nora/Code/cule/third_party/agency/agency/detail/tuple/arithmetic_tuple_facade.hpp:158:63: error: incomplete type ‘std::tuple_size<unsigned int>’ used in nested name specifier
  158 |              class = typename std::enable_if<
      |                                                               ^                                           
/home/nora/Code/cule/third_party/agency/agency/coordinate/point.hpp:197:1: note: candidate: ‘template<class T1, class T2, long unsigned int Rank> typename std::enable_if<(std::is_arithmetic<_Tp>::value && agency::detail::has_operator_multiplies<T1, T2>::value), agency::point<T, Rank> >::type agency::operator*(T1, const agency::point<T, Rank>&)’
  197 |   operator*(T1 val, const point<T2,Rank>& p)
      | ^ ~~~~~~
/home/nora/Code/cule/third_party/agency/agency/coordinate/point.hpp:197:1: note:   template argument deduction/substitution failed:
/home/nora/Code/cule/third_party/agency/agency/detail/operator_traits.hpp:92:88: note:   mismatched types ‘const agency::point<T, Rank>’ and ‘unsigned int’
   92 | struct has_operator_multiplies
      |                                                                                        ^

GCC indicates that this invalid template instantiation is required from torchcule/backend.cu:44:21, although the chain of dependencies linking that line of code to the final error is way too long and complex for me to understand. I've attached the entire stderr & stdout output from the compiler to this post. Any help toward solving this issue would be greatly appreciated. compile-errors.txt

Feb 02 '22 04:02 norabelrose

@sdalton1, @ifrosio do you have any solution to this issue? And in general how to run CuLE with the latest PyTorch?

Mar 24 '22 18:03 ViktorM

Hi there-I got it to work on my laptop (GTX 1650 Ti, CUDA 11.3, PyTorch 1.11.0) by fixing the following lines in setup.py:

codes = [arch[-2:] for arch in gpus]
arch_gencode = ['-arch=sm_' + codes[0]] + ['-gencode=arch=compute_{0},code=sm_{0}'.format(code) for code in codes]

You might also want to run it with python setup.py install --fastbuild to reduce the build time.

Jun 27 '22 05:06 Rohan138

@ifrosio, @sdalton1 any updates on the issue? On how to build and run CuLE on Amper GPUs?

Aug 13 '22 02:08 ViktorM

I got more errors:

/usr/local/cuda/bin/nvcc -I/home/denys/Documents/git/ml/cule -I/home/denys/Documents/git/ml/cule/third_party/agency -I/home/denys/Documents/git/ml/cule/third_party/pybind11/include -I/usr/local/cuda/include -I/home/denys/anaconda3/envs/rlgpu/include/python3.7m -c torchcule/backend.cu -o build/temp.linux-x86_64-cpython-37/torchcule/backend.o -arch=sm_70 -gencode=arch=compute_70,code=sm_70 -O3 -Xptxas=-v -Xcompiler=-Wall,-Wextra,-fPIC -allow-unsupported-compiler -ccbin=gcc
/usr/include/stdio.h(189): error: attribute "__malloc__" does not take arguments

/usr/include/stdio.h(201): error: attribute "__malloc__" does not take arguments

/usr/include/stdio.h(223): error: attribute "__malloc__" does not take arguments

Aug 13 '22 03:08 Denys88

I can't reproduce this error on my machine. I am compiling on Ubuntu 20.04.4, torch 1.12.0, gcc 9.4.0 and the cule main branch. I tried recompiling using an older version the cuda toolkit, version 11.3, from the dockerfile but that also worked on my machine. If anyone has a Dockerfile to generate the failure with the configured software that would help a lot.

Aug 13 '22 09:08 sdalton1

Thanks @sdalton1 I just installed the latest ubuntu ( 22.04 I think). Looks like it is related to the wrong gcc version. Will try to solve it using this link: https://linuxconfig.org/how-to-switch-between-multiple-gcc-and-g-compiler-versions-on-ubuntu-20-04-lts-focal-fossa

Aug 13 '22 16:08 Denys88