FLAMEGPU2
FLAMEGPU2 copied to clipboard
Test suite compilation issues with multiple cuda architectures
When building the c++ tests suite with many cuda architectures enabled, significant amounts of host resource can be consumed, potentialyl causing compilation failures (I.e. CI errors during release CI)
This was demonstrated by Windows CUDA 11.4 build with CUDA_ARCH="35 52 60 70 80"
triggering an out of memory issue when device-linking the test suite.
1>C:/Users/RUNNER~1/AppData/Local/Temp/tmpxft_00000814_00000000-8_tests.device-link.fatbin.c(9423890): fatal error C1060: compiler is out of heap space [D:\a\FLAMEGPU2\FLAMEGPU2\build\tests\tests.vcxproj]
Additionally, with linux CI a Segmentation fault
was encountered during test suite compilation, which is likely related.
For now on CI this has (hopefully) been addressed by building fewer cuda architectures when building the test suite, and reducing the number of threads used by nvcc
when using the --threads
parameter.
Longer term we have several options:
- Reduce the amount of code being device linked somehow, so fatbin linking is not generating 9 million LOC
- Split the C++ test sute into multiple binaries, which can then be orchestrated by a new
tests
binary, or throughctest
(see #267 for more information, and #285 for a very stale attempt at adding ctest)
@Robadob has encountered heap size related compialtion errors when building a debug configuration of the test suite under windows. Splitting the c++ test suite and orchestrating via ctest would be one way to address the challenge (also addressing this issue)
Visual Studio provides both x86 and x64 linkers. By default it uses x86 (and is supposed to switch to x64 if the heap is exceeded). There are alot of circle 2013 stack overflow posts saying how you can specify to use x64, this CMake var corresponds to that method. There's no recent discussion, so it's possible it nolonger has an impact in newer visual studios.
https://cmake.org/cmake/help/latest/variable/CMAKE_VS_PLATFORM_TOOLSET_HOST_ARCHITECTURE.html
It's also quite possible this won't make a difference, as the linked is called via nvcc
. In which case it would require changes on the part of NVidia