FLAMEGPU2 Test suite compilation issues with multiple cuda architectures

Test suite compilation issues with multiple cuda architectures

Open ptheywood opened this issue 3 years ago • 2 comments

When building the c++ tests suite with many cuda architectures enabled, significant amounts of host resource can be consumed, potentialyl causing compilation failures (I.e. CI errors during release CI)

This was demonstrated by Windows CUDA 11.4 build with CUDA_ARCH="35 52 60 70 80" triggering an out of memory issue when device-linking the test suite.

     1>C:/Users/RUNNER~1/AppData/Local/Temp/tmpxft_00000814_00000000-8_tests.device-link.fatbin.c(9423890): fatal error C1060: compiler is out of heap space [D:\a\FLAMEGPU2\FLAMEGPU2\build\tests\tests.vcxproj]

Additionally, with linux CI a Segmentation fault was encountered during test suite compilation, which is likely related.

For now on CI this has (hopefully) been addressed by building fewer cuda architectures when building the test suite, and reducing the number of threads used by nvcc when using the --threads parameter.

Longer term we have several options:

Reduce the amount of code being device linked somehow, so fatbin linking is not generating 9 million LOC
Split the C++ test sute into multiple binaries, which can then be orchestrated by a new tests binary, or through ctest (see #267 for more information, and #285 for a very stale attempt at adding ctest)

Aug 18 '21 16:08 ptheywood

@Robadob has encountered heap size related compialtion errors when building a debug configuration of the test suite under windows. Splitting the c++ test suite and orchestrating via ctest would be one way to address the challenge (also addressing this issue)

Mar 29 '22 12:03 ptheywood

Visual Studio provides both x86 and x64 linkers. By default it uses x86 (and is supposed to switch to x64 if the heap is exceeded). There are alot of circle 2013 stack overflow posts saying how you can specify to use x64, this CMake var corresponds to that method. There's no recent discussion, so it's possible it nolonger has an impact in newer visual studios.

https://cmake.org/cmake/help/latest/variable/CMAKE_VS_PLATFORM_TOOLSET_HOST_ARCHITECTURE.html

It's also quite possible this won't make a difference, as the linked is called via nvcc. In which case it would require changes on the part of NVidia

Dec 13 '22 11:12 Robadob

FLAMEGPU2 FLAMEGPU2 copied to clipboard

Test suite compilation issues with multiple cuda architectures

FLAMEGPU2
FLAMEGPU2 copied to clipboard