abacus-develop icon indicating copy to clipboard operation
abacus-develop copied to clipboard

double free or corruption (!prev) or corrupted double-linked list in the end of scf calculation

Open kogareru1z opened this issue 10 months ago • 10 comments

Describe the bug

When I was using ABACUS as the first-principles calculation software for DPGEN, the process was interrupted. Upon checking the ABACUS log, I found that this error appeared in the last line after the task was completed. If I use cpu for calculation, I get corrupted double-linked list error.

Expected behavior

The calculation should end normally instead of being interrupted by this error after completion.

To Reproduce

error.zip

Environment

WSl-Ubuntu-22.04 abacus3.60 intel oneapi 2021

Additional Context

No response

Task list for Issue attackers (only for developers)

  • [ ] Verify the issue is not a duplicate.
  • [ ] Describe the bug.
  • [ ] Steps to reproduce.
  • [ ] Expected behavior.
  • [ ] Error message.
  • [ ] Environment details.
  • [ ] Additional context.
  • [ ] Assign a priority level (low, medium, high, urgent).
  • [ ] Assign the issue to a team member.
  • [ ] Label the issue with relevant tags.
  • [ ] Identify possible related issues.
  • [ ] Create a unit test or automated test to reproduce the bug (if applicable).
  • [ ] Fix the bug.
  • [ ] Test the fix.
  • [ ] Update documentation (if necessary).
  • [ ] Close the issue and inform the reporter (if applicable).

kogareru1z avatar Apr 24 '24 14:04 kogareru1z

Hi @kogareru1z , Please make sure your MPI used for running ABACUS is of the same package for building ABACUS.

caic99 avatar Apr 24 '24 16:04 caic99

Hi @kogareru1z , Please make sure your MPI used for running ABACUS is of the same package for building ABACUS.

How can I ensure that the MPI used to run ABACUS is the same as the package used to build ABACUS? I installed ABACUS using conda.

kogareru1z avatar Apr 24 '24 23:04 kogareru1z

Hi @kogareru1z , Please make sure your MPI used for running ABACUS is of the same package for building ABACUS.

How can I ensure that the MPI used to run ABACUS is the same as the package used to build ABACUS? I installed ABACUS using conda.

Run which mpirun, and check if it locates under your conda environment.

caic99 avatar Apr 25 '24 02:04 caic99

Hi @kogareru1z , Please make sure your MPI used for running ABACUS is of the same package for building ABACUS.

How can I ensure that the MPI used to run ABACUS is the same as the package used to build ABACUS? I installed ABACUS using conda.

Run which mpirun, and check if it locates under your conda environment. I tried to compile abacus using the command CXX=icpx cmake -B build -DUSE_CUDA=1, but cmake reported an error. 1714015662724

kogareru1z avatar Apr 25 '24 03:04 kogareru1z

@kogareru1z Please attach the full command line and output using text.

caic99 avatar Apr 25 '24 03:04 caic99

@kogareru1z Please attach the full command line and output using text.

command:CXX=icpx cmake -B build -DUSE_CUDA=1 CMakeError.log CMakeOutput.log

kogareru1z avatar Apr 25 '24 04:04 kogareru1z

@kogareru1z Please attach the full command line and output using text.

command:CXX=icpx cmake -B build -DUSE_CUDA=1 CMakeError.log CMakeOutput.log

In CMakeError.log:

/usr/local/cuda-12.3/bin/../targets/x86_64-linux/include/crt/host_config.h:124:2: error: -- unsupported Intel ICX compiler! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.

Please try export NVCC_APPEND_FLAGS='-allow-unsupported-compiler' before configuring. However, I would suggest using GCC toolchain in compatible with ELPA.

caic99 avatar Apr 25 '24 04:04 caic99

@kogareru1z Please attach the full command line and output using text.

command:CXX=icpx cmake -B build -DUSE_CUDA=1 CMakeError.log CMakeOutput.log

In CMakeError.log:

/usr/local/cuda-12.3/bin/../targets/x86_64-linux/include/crt/host_config.h:124:2: error: -- unsupported Intel ICX compiler! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.

Please try export NVCC_APPEND_FLAGS='-allow-unsupported-compiler' before configuring. However, I would suggest using GCC toolchain in compatible with ELPA.

When I compile with CXX=gcc cmake -B build -DUSE_CUDA=1, an error occurs in the final linking process after executing make -j8 [ 96%] Linking CXX executable abacus [ 98%] Built target diag_cusolver [100%] Built target operator_ks_lcao /usr/bin/ld: warning: libmpi.so.40, needed by /usr/lib/x86_64-linux-gnu/libelpa.so, may conflict with libmpi.so.12 /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_dispatch_next_4' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_for_static_fini' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_dispatch_init_4u' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_omp_task_alloc' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_dispatch_next_8' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_barrier' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_omp_task' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_dispatch_init_4' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_end_serialized_parallel' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_atomic_fixed4_add' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_dispatch_fini_8' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_for_static_init_4u' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_atomic_fixed8_add' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_dispatch_next_4u' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_atomic_float4_add' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_atomic_float8_max' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_reduce' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_critical_with_hint' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_end_critical' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_for_static_init_8' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_atomic_float8_add' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_master' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_end_reduce' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_flush' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_dispatch_next_8u' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_for_static_init_8u' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_fork_call' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_for_static_init_4' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_ordered' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_push_num_threads' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_dispatch_init_8u' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_atomic_cmplx8_add' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_global_thread_num' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_critical' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_omp_task_with_deps' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_reduce_nowait' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_atomic_float4_max' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_end_master' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_single' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_dispatch_init_8' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_atomic_cmplx4_add' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_ok_to_fork' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_end_single' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_end_ordered' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_end_reduce_nowait' /usr/bin/ld: /opt/intel/oneapi/mkl/2024.1/lib/libmkl_intel_thread.so: undefined reference to __kmpc_serialized_parallel' collect2: error: ld returned 1 exit status make[2]: *** [CMakeFiles/abacus.dir/build.make:1091: abacus] Error 1 make[1]: *** [CMakeFiles/Makefile2:847: CMakeFiles/abacus.dir/all] Error 2 make: *** [Makefile:136: all] Error 2

kogareru1z avatar Apr 25 '24 05:04 kogareru1z

@kogareru1z Please follow the docs regarding install requirements (including ELPA) by apt - there's no need to use Intel toolkit (including MKL) for building ABACUS, since the requirements are already satisfied.

caic99 avatar Apr 25 '24 06:04 caic99

@kogareru1z, can we close this issue now?

WHUweiqingzhou avatar May 06 '24 06:05 WHUweiqingzhou

This issue is closed for no update, and feel free to reopen it if you want more discussion.

WHUweiqingzhou avatar Jun 18 '24 03:06 WHUweiqingzhou