amrex Cannot build debug version with CUDA enabled

I want to build a build_type=Debug version of AMReX with CUDA enabled. I am using Spack, and building this configuration:

amrex @develop +cuda cuda_arch=75 ~fortran +hdf5 +openmp +particles +shared build_type=Debug

When building this configuration, the file AMReX_MLNodeLaplacian.cpp takes a long time to compile. I aborted after more than 2 hours, as there didn't seem to be any process – none of the temporary files produced by the compiler had been updated in these 2 hours. The running process is

eschnet+  8135  100  0.8 857256 828672 pts/10  R+   10:21   1:16 cicc --c++14 --gnu_version=100200 --diag_error this_addr_capture_ext_lambda --diag_error h_hd_illegal_call --diag_error h_hd_illegal_call_generic --orig_src_file_name /tmp/eschnetter/spack-stage/spack-stage-amrex-develop-tb6carl647xveqxh3pwhqcdan55vta74/spack-src/Src/LinearSolvers/MLMG/AMReX_MLNodeLaplacian.cpp --allow_managed --extended-lambda --relaxed_constexpr --device-c --diag_suppress=esa_on_defaulted_function_ignored --display_error_number -arch compute_75 -show-src -m64 --no-version-ident -ftz=1 -prec_div=0 -prec_sqrt=0 -fmad=1 -fast-math --gen_div_approx_ftz --include_file_name tmpxft_00001fb6_00000000-3_AMReX_MLNodeLaplacian.fatbin.c -generate-line-info -maxreg 255 -tused --gen_module_id_file --module_id_file_name /tmp/tmpxft_00001fb6_00000000-4_AMReX_MLNodeLaplacian.module_id --gen_c_file_name /tmp/tmpxft_00001fb6_00000000-6_AMReX_MLNodeLaplacian.cudafe1.c --stub_file_name /tmp/tmpxft_00001fb6_00000000-6_AMReX_MLNodeLaplacian.cudafe1.stub.c --gen_device_file_name /tmp/tmpxft_00001fb6_00000000-6_AMReX_MLNodeLaplacian.cudafe1.gpu /tmp/tmpxft_00001fb6_00000000-7_AMReX_MLNodeLaplacian.cpp1.ii -o /tmp/tmpxft_00001fb6_00000000-6_AMReX_MLNodeLaplacian.ptx

No other sub-processes of cicc1 (cudafe, g++, cc1plus`, etc.) are running at this time.

The same problem happens with older versions of AMReX, e.g. 21.03 and 21.02 (with HDF5 disabled).

I am using GCC 10.2 and tried both CUDA 11.2.1 and 11.2.2.

An optimized build (without the build_type=Debug option) finishes much more quickly; I don't think any file takes more than a few minutes to build.

Mar 26 '21 14:03 eschnett

Using recent versions of CUDA I am able to compile this file in 1-2 minutes. I wonder if you are building on a platform where memory or CPU resources are limited in a way that causes a problem.

@WeiqunZhang There's a lot of ParallelFor's in this file, which cumulatively contribute to the long build time for this file. Are there obvious ways to split up this file to make it easier to do a parallel build, and be less likely to run into the issue reported here?

Mar 29 '21 19:03 maximumcats

Sure, we can split the file.

@eschnett Could you let us know the exact command line (e.g., nvcc .....) that is used to compile the file?

Mar 29 '21 19:03 WeiqunZhang

cc @mic84 due to Spack relation :)

May 17 '21 23:05 ax3l

FWIW, I am linking ~30min-1hr on Summit when building WarpX with NVCC in debug.

I think since the issue was posted, the MLMG was split up, wasn't it? Can we cross-link a PR just to close the loop? :) Is it #1966? :)

May 17 '21 23:05 ax3l