OpenSubdiv
OpenSubdiv copied to clipboard
Building with high parallelism and CUDA support results in sporadic build failures
Building with 48 threads, of 50 sequential builds, 19 failed (38% failure rate). Am building via nixpkgs drv, but I don’t see any reason why it’s specific to that build environment. Building without CUDA saw no failures in 50 runs.
My guess is there’s an implicit dependency somewhere, I spent a brief bit trying to find it but did not (I’m not very proficient with CMake).
I have seen at least two different failures:
CMake Error at /nix/store/0dv0ylafnx7cdajyv9ahbpqrniblixq1-cmake-3.26.4/share/cmake-3.26/Modules/FindCUDA/make2cmake.cmake:48 (file):
file failed to open for reading (No such file or directory):
/build/source/build/opensubdiv/CMakeFiles/osd_static_gpu.dir/osd/osd_static_gpu_generated_cudaKernel.cu.o.NVCC-depend
CMake Error at osd_static_gpu_generated_cudaKernel.cu.o.Release.cmake:236 (message):
Error generating
/build/source/build/opensubdiv/CMakeFiles/osd_static_gpu.dir/osd/./osd_static_gpu_generated_cudaKernel.cu.o
make[2]: *** [opensubdiv/CMakeFiles/osd_dynamic_gpu.dir/build.make:77: opensubdiv/CMakeFiles/osd_static_gpu.dir/osd/osd_static_gpu_generated_cudaKernel.cu.o] Error 1
and
Error copying file (if different) from "/build/source/build/opensubdiv/CMakeFiles/osd_static_gpu.dir/osd/osd_static_gpu_generated_cudaKernel.cu.o.depend.tmp" to "/build/source/build/opensubdiv/CMakeFiles/osd_static_gpu.dir/osd/osd_static_gpu_generated_cudaKernel.cu.o.depend".
CMake Error at osd_static_gpu_generated_cudaKernel.cu.o.Release.cmake:246 (message):
Error generating
/build/source/build/opensubdiv/CMakeFiles/osd_static_gpu.dir/osd/./osd_static_gpu_generated_cudaKernel.cu.o
make[2]: *** [opensubdiv/CMakeFiles/osd_dynamic_gpu.dir/build.make:77: opensubdiv/CMakeFiles/osd_static_gpu.dir/osd/osd_static_gpu_generated_cudaKernel.cu.o] Error 1
Filed as internal issue #OSD-426
Interesting. We haven't seen that before. Can you tell us more about your system configuration: OS, Compiler, GPU, Driver version, CUDA version?
Hi! Thanks for the reply.
- OS is NixOS @ https://github.com/NixOS/nixpkgs/commit/9ca785644d067445a4aa749902b29ccef61f7476 (Linux Kernel 6.1)
- Opensubdiv src @ v3.5.0
- GCC 12.3.0 (note that
-DCUDA_HOST_COMPILERis different), CMake 3.26.4 - CUDA toolkit 11.8.0
- CPU is AMD 3960X (24-core, 48-threads), 192 GB RAM
- GPU is 3080 Ti with driver 535.86.05 (however I think this should not matter, as I don’t believe the GPU is used during build)
Log output of configure stage + build flags
Note that I have manually wrapped the cmake flags to make them easier to read.
@nix { "action": "setPhase", "phase": "configurePhase" }
configuring
fixing cmake files...
cmake flags:
-DCMAKE_FIND_USE_SYSTEM_PACKAGE_REGISTRY=OFF
-DCMAKE_FIND_USE_PACKAGE_REGISTRY=OFF
-DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON
-DCMAKE_BUILD_TYPE=Release
-DBUILD_TESTING=OFF
-DCMAKE_INSTALL_LOCALEDIR=/nix/store/aw2139d316dfcan625spblpib2449b33-opensubdiv-3.5.0/share/locale
-DCMAKE_INSTALL_LIBEXECDIR=/nix/store/aw2139d316dfcan625spblpib2449b33-opensubdiv-3.5.0/libexec
-DCMAKE_INSTALL_LIBDIR=/nix/store/aw2139d316dfcan625spblpib2449b33-opensubdiv-3.5.0/lib
-DCMAKE_INSTALL_DOCDIR=/nix/store/aw2139d316dfcan625spblpib2449b33-opensubdiv-3.5.0/share/doc/OpenSubdiv
-DCMAKE_INSTALL_INFODIR=/nix/store/aw2139d316dfcan625spblpib2449b33-opensubdiv-3.5.0/share/info
-DCMAKE_INSTALL_MANDIR=/nix/store/aw2139d316dfcan625spblpib2449b33-opensubdiv-3.5.0/share/man
-DCMAKE_INSTALL_OLDINCLUDEDIR=/nix/store/1np3p9y42nv1m06ywspgqj20r5p41xla-opensubdiv-3.5.0-dev/include
-DCMAKE_INSTALL_INCLUDEDIR=/nix/store/1np3p9y42nv1m06ywspgqj20r5p41xla-opensubdiv-3.5.0-dev/include
-DCMAKE_INSTALL_SBINDIR=/nix/store/aw2139d316dfcan625spblpib2449b33-opensubdiv-3.5.0/sbin
-DCMAKE_INSTALL_BINDIR=/nix/store/aw2139d316dfcan625spblpib2449b33-opensubdiv-3.5.0/bin
-DCMAKE_INSTALL_NAME_DIR=/nix/store/aw2139d316dfcan625spblpib2449b33-opensubdiv-3.5.0/lib
-DCMAKE_POLICY_DEFAULT_CMP0025=NEW
-DCMAKE_OSX_SYSROOT=
-DCMAKE_FIND_FRAMEWORK=LAST
-DCMAKE_STRIP=/nix/store/x7n44lfys59k5ajj9w1fkxw5391cnn5v-gcc-wrapper-12.3.0/bin/strip
-DCMAKE_RANLIB=/nix/store/x7n44lfys59k5ajj9w1fkxw5391cnn5v-gcc-wrapper-12.3.0/bin/ranlib
-DCMAKE_AR=/nix/store/x7n44lfys59k5ajj9w1fkxw5391cnn5v-gcc-wrapper-12.3.0/bin/ar
-DCMAKE_C_COMPILER=gcc
-DCMAKE_CXX_COMPILER=g++
-DCMAKE_INSTALL_PREFIX=/nix/store/aw2139d316dfcan625spblpib2449b33-opensubdiv-3.5.0
-DNO_TUTORIALS=1
-DNO_REGRESSION=1
-DNO_EXAMPLES=1
-DNO_METAL=1
-DGLEW_INCLUDE_DIR=/nix/store/55n26bd7l2jdxj8fkh688nrv290d3hp8-glew-2.2.0-dev/include
-DGLEW_LIBRARY=/nix/store/55n26bd7l2jdxj8fkh688nrv290d3hp8-glew-2.2.0-dev/lib
-DOSD_CUDA_NVCC_FLAGS=--gpu-architecture=compute_37
-DCUDA_HOST_COMPILER=/nix/store/m3lj9k2f39yplgr81pv9j1p13p3mq0pz-gcc-wrapper-11.4.0/bin/cc
-DNO_OPENCL=1
-DCUDA_TOOLKIT_ROOT_DIR=/nix/store/vxw61j9ff7d5jdq2cwy1bh4q5j82jvy5-cudatoolkit-11.8.0
-DCUDA_HOST_COMPILER=/nix/store/m3lj9k2f39yplgr81pv9j1p13p3mq0pz-gcc-wrapper-11.4.0/bin
-DCMAKE_CUDA_HOST_COMPILER=/nix/store/m3lj9k2f39yplgr81pv9j1p13p3mq0pz-gcc-wrapper-11.4.0/bin
/m3lj9k2f39yplgr81pv9j1p13p3mq0pz-gcc-wrapper-11.4.0/bin/cc -DNO_OPENCL=1 -DCUDA_TOOLKIT_ROOT_DIR=/nix/store/vxw61j9ff7d5jdq2cwy1bh4q5j82jvy5-cudatoolkit-11.8.0 -DCUDA_HOST_COMPILER=/nix/store/m3lj9k2f39yplgr81pv9j1p13p3mq0pz-gcc-wrapper-11.4.0/bin -DCMAKE_CUDA_HOST_COMPILER=/nix/store/m3lj9k2f39yplgr81pv9j1p13p3mq0pz-gcc-wrapper-11.4.0/bin
-- The C compiler identification is GNU 12.3.0
-- The CXX compiler identification is GNU 12.3.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /nix/store/x7n44lfys59k5ajj9w1fkxw5391cnn5v-gcc-wrapper-12.3.0/bin/gcc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /nix/store/x7n44lfys59k5ajj9w1fkxw5391cnn5v-gcc-wrapper-12.3.0/bin/g++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Compiling OpenSubdiv version v3_5_0
-- Using cmake version 3.26.4
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- Could NOT find TBB (missing: TBB_INCLUDE_DIR TBB_LIBRARIES) (Required is at least version "4.0")
-- Found OpenGL: /nix/store/xibw0p5bj2z3a566mannk3vflb9f5fph-libGL-1.6.0/lib/libOpenGL.so
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Found CUDA: /nix/store/vxw61j9ff7d5jdq2cwy1bh4q5j82jvy5-cudatoolkit-11.8.0 (found suitable version "11.8", minimum required is "4.0")
-- Found X11: /nix/store/gz38plw089ri9k2lh7gzhh58ydhb3rv1-xorgproto-2023.2/include
-- Looking for XOpenDisplay in /nix/store/igp21718s3sa932z7baqnhlc72v0zl0z-libX11-1.8.6/lib/libX11.so;/nix/store/4s3wrg560496dx3qx8gnvvjqz4hc9222-libXext-1.3.5/lib/libXext.so
-- Looking for XOpenDisplay in /nix/store/igp21718s3sa932z7baqnhlc72v0zl0z-libX11-1.8.6/lib/libX11.so;/nix/store/4s3wrg560496dx3qx8gnvvjqz4hc9222-libXext-1.3.5/lib/libXext.so - found
-- Looking for gethostbyname
-- Looking for gethostbyname - found
-- Looking for connect
-- Looking for connect - found
-- Looking for remove
-- Looking for remove - found
-- Looking for shmat
-- Looking for shmat - found
-- Could NOT find GLFW (missing: GLFW_INCLUDE_DIR GLFW_LIBRARIES) (Required is at least version "3.0.0")
-- Could NOT find PTex (missing: PTEX_INCLUDE_DIR PTEX_LIBRARY) (Required is at least version "2.0")
-- Could NOT find ZLIB (missing: ZLIB_LIBRARY ZLIB_INCLUDE_DIR) (Required is at least version "1.2")
-- Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE) (Required is at least version "1.8.4")
-- Could NOT find Docutils (missing: RST2HTML_EXECUTABLE DOCUTILS_VERSION) (Required is at least version "0.9")
-- Found Python: /nix/store/9c03r86hcdn43dm3hsgjirifvyzfkhwh-python3-3.10.12/bin/python3.10 (found version "3.10.12") found components: Interpreter
CMake Warning at CMakeLists.txt:430 (message):
TBB was not found : support for TBB parallel compute kernels will be
disabled in Osd. If your compiler supports TBB directives, please refer to
the FindTBB.cmake shared module in your cmake installation.
CMake Warning at CMakeLists.txt:619 (message):
Ptex was not found : the OpenSubdiv Ptex example will not be available. If
you do have Ptex installed and see this message, please add your Ptex path
to FindPTex.cmake in /build/source/cmake or set it through the
PTEX_LOCATION cmake command line argument or environment variable.
CMake Warning at documentation/CMakeLists.txt:52 (message):
Doxyen was not found : support for Doxygen automated API documentation is
disabled.
-- Configuring done (3.6s)
-- Generating done (0.0s)
CMake Warning:
Manually-specified variables were not used by the project:
BUILD_TESTING
CMAKE_EXPORT_NO_PACKAGE_REGISTRY
CMAKE_POLICY_DEFAULT_CMP0025
GLEW_LIBRARY
-- Build files have been written to: /build/source/build
cmake: enabled parallel building
cmake: enabled parallel installing
@nix { "action": "setPhase", "phase": "buildPhase" }
building
build flags: -j48 SHELL=/nix/store/a7f7xfp9wyghf44yv6l6fv9dfw492hd3-bash-5.2-p15/bin/bash
(Remainder of logs omitted)
Thanks for the additional information!
I just hit this failure when building nixpkgs. The build succeeded on retry. Just making it known that the workaround is not a silver bullet.