pybind11
pybind11 copied to clipboard
[BUG]: Build Error with CUDA 11.4-11.7 & Operators
Required prerequisites
- [X] Make sure you've read the documentation. Your issue may be addressed there.
- [X] Search the issue tracker and Discussions to verify that this hasn't already been reported. +1 or comment there if it has.
- [X] Consider asking first in the Gitter chat room or in a Discussion.
Problem description
I am compiling pybind11 v2.10.0-38-g424ac4fe on Perlmutter at NERSC.
I use the following software modules:
module load cmake/3.22.0
module load PrgEnv-gnu
module load cudatoolkit/11.7
module load cray-python/3.9.7.1
# compiler environment hints
export CRAY_ACCEL_TARGET=nvidia80
export CC=cc #$(which gcc)
export CXX=CC #$(which g++)
export FC=ftn # $(which gfortran)
export CUDACXX=$(which nvcc)
export CUDAHOSTCXX=${CXX}
export CFLAGS="${CFLAGS} -O3 -ffast-math"
export CXXFLAGS="${CXXFLAGS} -O3 -ffast-math"
export FCLAGS="${FCFLAGS} -O3 -ffast-math"
$ CC --version
g++ (GCC) 11.2.0 20210728 (Cray Inc.)
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ cc --version
gcc (GCC) 11.2.0 20210728 (Cray Inc.)
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_18:49:52_PDT_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0
Reproducible example code
cmake -S . -B build -DPYBIND11_CUDA_TESTS=ON -DPYBIND11_WERROR=ON -DDOWNLOAD_CATCH=ON
-- The CXX compiler identification is GNU 11.2.0
-- Cray Programming Environment 2.7.16 CXX
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/cray/pe/craype/2.7.16/bin/CC - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- pybind11 v2.11.0 dev1
-- CMake 3.22.0
-- Found PythonInterp: /usr/bin/python3.6 (found suitable version "3.6.15", minimum required is "3.6")
-- Found PythonLibs: /usr/lib64/libpython3.6m.so
-- PYTHON 3.6.15
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- pybind11::lto enabled
-- pybind11::thin_lto enabled
-- Setting tests build type to MinSizeRel as none was specified
-- The CUDA compiler identification is NVIDIA 11.7.64
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /opt/nvidia/hpc_sdk/Linux_x86_64/22.5/cuda/11.7/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Skipping test_constants_and_functions due to incompatible exception specifications
-- Building tests WITHOUT Eigen, use -DDOWNLOAD_EIGEN=ON on CMake 3.11+ to download
-- Found Boost: /usr/include (found suitable version "1.66.0", minimum required is "1.56")
CMake Warning at tools/pybind11Common.cmake:227 (message):
Missing: pytest 3.1
Try: /usr/bin/python3.6 -m pip install pytest
Call Stack (most recent call first):
tests/CMakeLists.txt:476 (pybind11_find_import)
-- Configuring done
-- Generating done
CMake Warning:
Manually-specified variables were not used by the project:
DOWNLOAD_CATCH
-- Build files have been written to: /global/homes/a/ahuebl/src/pybind11/build
cmake --build build
[ 2%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/pybind11_tests.cpp.o
[ 4%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_async.cpp.o
[ 6%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_buffers.cpp.o
[ 8%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_builtin_casters.cpp.o
[ 10%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_call_policies.cpp.o
[ 13%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_callbacks.cpp.o
[ 15%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_chrono.cpp.o
[ 17%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_class.cpp.o
[ 19%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_const_name.cpp.o
[ 21%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_copy_move.cpp.o
[ 23%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_custom_type_casters.cpp.o
[ 26%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_custom_type_setup.cpp.o
[ 28%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_docstring_options.cpp.o
[ 30%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_enum.cpp.o
[ 32%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_eval.cpp.o
[ 34%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_exceptions.cpp.o
[ 36%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_factory_constructors.cpp.o
[ 39%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_gil_scoped.cpp.o
[ 41%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_iostream.cpp.o
[ 43%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_kwargs_and_defaults.cpp.o
[ 45%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_local_bindings.cpp.o
[ 47%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_methods_and_attributes.cpp.o
[ 50%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_modules.cpp.o
[ 52%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_multiple_inheritance.cpp.o
[ 54%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_numpy_array.cpp.o
[ 56%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_numpy_dtypes.cpp.o
[ 58%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_numpy_vectorize.cpp.o
[ 60%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_opaque_types.cpp.o
[ 63%] Building CUDA object tests/CMakeFiles/pybind11_tests.dir/test_operator_overloading.cpp.o
/global/homes/a/ahuebl/src/pybind11/tests/test_operator_overloading.cpp: In function 'void test_submodule_operators(pybind11::module_&)':
/global/homes/a/ahuebl/src/pybind11/tests/test_operator_overloading.cpp:157:78: error: no matching function for call to 'pybind11::class_<Vector2>::def(pybind11::detail::op_<pybind11::detail::op_add, pybind11::detail::op_l, pybind11::detail::self_t, pybind11::detail::self_t>)'
157 | py::class_<Vector2>(m, "Vector2")
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
/global/homes/a/ahuebl/src/pybind11/include/pybind11/pybind11.h:1557:1: note: candidate: 'template<class Func, class ... Extra> pybind11::class_<type_, options>& pybind11::class_<type_, options>::def(const char*, Func&&, const Extra& ...) [with Func = Func; Extra = {Extra ...}; type_ = Vector2; options = {}]'
1557 | class_ &def(const char *name_, Func &&f, const Extra &...extra) {
| ^
/global/homes/a/ahuebl/src/pybind11/include/pybind11/pybind11.h:1557:1: note: template argument deduction/substitution failed:
/global/homes/a/ahuebl/src/pybind11/tests/test_operator_overloading.cpp:157:78: note: candidate expects at least 2 arguments, 1 provided
157 | py::class_<Vector2>(m, "Vector2")
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
/global/homes/a/ahuebl/src/pybind11/include/pybind11/pybind11.h:1582:1: note: candidate: 'template<pybind11::detail::op_id id, pybind11::detail::op_type ot, class L, class R, class ... Extra> pybind11::class_<type_, options>& pybind11::class_<type_, options>::def(const pybind11::detail::op_<(pybind11::detail::op_id)(id), (pybind11::detail::op_type)(ot), L, R>&, const Extra& ...) [with pybind11::detail::op_id id = id; pybind11::detail::op_type ot = ot; L = L; R = R; Extra = {Extra ...}; type_ = Vector2; options = {}]'
1582 | class_ &def(const detail::op_<id, ot, L, R> &op, const Extra &...extra) {
| ^
/global/homes/a/ahuebl/src/pybind11/include/pybind11/pybind11.h:1582:1: note: template argument deduction/substitution failed:
/global/homes/a/ahuebl/src/pybind11/tests/test_operator_overloading.cpp:157:78: note: couldn't deduce template parameter 'id'
157 | py::class_<Vector2>(m, "Vector2")
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
/global/homes/a/ahuebl/src/pybind11/include/pybind11/pybind11.h:1594:1: note: candidate: 'template<class ... Args, class ... Extra> pybind11::class_<type_, options>& pybind11::class_<type_, options>::def(const pybind11::detail::initimpl::constructor<Args ...>&, const Extra& ...) [with Args = {Args ...}; Extra = {Extra ...}; type_ = Vector2; options = {}]'
1594 | class_ &def(const detail::initimpl::constructor<Args...> &init, const Extra &...extra) {
| ^
...
More details
Failing compile line:
/opt/nvidia/hpc_sdk/Linux_x86_64/22.5/cuda/11.7/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/opt/cray/pe/craype/2.7.16/bin/CC -DPYBIND11_TEST_BOOST -Dpybind11_tests_EXPORTS -I/global/homes/a/ahuebl/src/pybind11/include -isystem=/usr/include/python3.6m -O1 -DNDEBUG --generate-code=arch=compute_52,code=[compute_52,sm_52] -Xcompiler=-fPIC -Xcompiler=-fvisibility=hidden -Werror all-warnings -std=c++17 -MD -MT tests/CMakeFiles/pybind11_tests.dir/test_operator_overloading.cpp.o -MF CMakeFiles/pybind11_tests.dir/test_operator_overloading.cpp.o.d -x cu -c /global/homes/a/ahuebl/src/pybind11/tests/test_operator_overloading.cpp -o CMakeFiles/pybind11_tests.dir/test_operator_overloading.cpp.o
Pre-processed file from -E: test_operator_overloading.cpp.txt
Cross-References
NERSC ticket: INC0191398
Vanilla Nvidia Linux Docker with CTK 11.7.1:
$ docker run -it nvidia/cuda:11.7.1-devel-ubuntu20.04
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0
$ g++ --version
g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ apt update && apt install -y git cmake python3 python3-dev python3-setuptools
$ git clone https://github.com/pybind/pybind11.git
$ cmake -S pybind11 -B build -DPYBIND11_CUDA_TESTS=ON -DPYBIND11_WERROR=ON
$ cmake --build build
-> same issue.
Looks like it's not HPE specific but a general Nvidia NVCC issue.
We have CI run for CUDA, right? Maybe we could add 11.7 & show it breaking in CI?
Ahh, yes: https://github.com/pybind/pybind11/pull/3968
I repeated the compilation with the following docker containers - but with -DPYBIND11_WERROR=OFF.
Legend: :heavy_check_mark: ok - :x: fail
- :heavy_check_mark:
nvidia/cuda:11.2.2-devel-ubuntu20.04 - :heavy_check_mark:
nvidia/cuda:11.3.1-devel-ubuntu20.04 - :x:
nvidia/cuda:11.4.3-devel-ubuntu20.04-fpermissiveerrors inpybind11/stl_bind.hforpybind11_cross_module_tests.cpp- type deduction errors for self & operators in
pybind11_cross_module_tests.cpp(see PR description) - type deduction errors for self & operators in
test_operator_overloading.cpp(see PR description)
- :x:
nvidia/cuda:11.5.1-devel-ubuntu20.04-fpermissiveerrors inpybind11/stl_bind.hforpybind11_cross_module_tests.cpp- type deduction errors in
pybind11_cross_module_tests.cpp(see PR description) - type deduction errors for self & operators in
test_operator_overloading.cpp(see PR description)
- :x:
nvidia/cuda:11.6.1-devel-ubuntu20.04-fpermissiveerrors inpybind11/stl_bind.hforpybind11_cross_module_tests.cpp- type deduction errors for self & operators in
pybind11_cross_module_tests.cpp(see PR description) - type deduction errors for self & operators in
test_operator_overloading.cpp(see PR description)
- :x:
nvidia/cuda:11.7.1-devel-ubuntu20.04- type deduction errors for self & operators in
test_operator_overloading.cpp(see PR description) - ...
- type deduction errors for self & operators in
- :x: CUDA 11.8.0
Uff, tried again today and still cannot find a simple work-around.
Happy to report we made great progress on this with the help of Nvidia developers :tada:
- :heavy_check_mark: fix landed in their development branches, just after the 11.8.0 CUDA Toolkit (CTK) release
- :heavy_check_mark: Nvidia found a work-around that we can use in the meantime for CTK 11.4-11.8, e.g., as patch in package managers
- :crossed_fingers: due to its popularity, from SciPy to RAPIDS AI projects, we try to get pybind11 into the internal Nvidia compiler regression suite for nvcc
Issue Description from Nvidia
NVCC parses the input and regenerates host side C++ to send to the host compiler. There’s a bug in the host C++ generation, where the def function (and def_cast) get unnecessary casts added in the declaration of op parameter, i.e. the code sent to gcc is broken due to the extra casts ( ... ) inserted for the first two template args:
template <detail::op_id id, detail::op_type ot, typename L, typename R, typename... Extra>
class_ &def(const detail::op_<( detail::op_id )id, (detail::op_type)ot, L, R> &op, const Extra &...extra)
Work-Around for CTK 11.4-11.8
Replace the logic in https://github.com/pybind/pybind11/blob/v2.10.0/include/pybind11/pybind11.h#L1581-L1591 with a more general pattern, such as:
template <typename T, typename... Extra>
class_ &def(const T &op, const Extra &...extra)
For example:
diff --git a/include/pybind11/pybind11.h b/include/pybind11/pybind11.h
index c889dc41..43f4abc3 100644
--- a/include/pybind11/pybind11.h
+++ b/include/pybind11/pybind11.h
@@ -1578,14 +1578,14 @@ public:
return *this;
}
- template <detail::op_id id, detail::op_type ot, typename L, typename R, typename... Extra>
- class_ &def(const detail::op_<id, ot, L, R> &op, const Extra &...extra) {
+ template <typename T, typename... Extra>
+ class_ &def(const T &op, const Extra &...extra) {
op.execute(*this, extra...);
return *this;
}
- template <detail::op_id id, detail::op_type ot, typename L, typename R, typename... Extra>
- class_ &def_cast(const detail::op_<id, ot, L, R> &op, const Extra &...extra) {
+ template <typename T, typename... Extra>
+ class_ &def_cast(const T &op, const Extra &...extra) {
op.execute_cast(*this, extra...);
return *this;
}
This unbreaks the test suite for me :tada: all runtime tests pass as well.
Due to the broad pattern, this is probably not suitable for mainline @henryiii @Skylion007?
But I think it is good enough to patch in package managers. Should we add an enable_if or so to make the template matching a bit more safe? (I think that compile-time does not matter for targeted patches in package managers, as long as it unbreaks the affected CTK compiler versions with narrow #ifdefs.)
I think it's fine to patch it for a restricted range of compilers. nvcc 11.4 - 11.8.0? I'd like to avoid package managers patching pybind11 if possible.
Is this something that might land in 11.8.1 or is it 11.9+ only?
Ok, sounds good. Proposed in #4220
Is this something that might land in 11.8.1 or is it 11.9+ only?
I don't know, since these are internal roadmap details. I assume all following CUDA Toolkit releases after 11.8.0.