abacus-develop icon indicating copy to clipboard operation
abacus-develop copied to clipboard

The problem of using openmpi-v4 of toolchain to compile abacus-develop-LTSv3.10.0

Open summitmoon opened this issue 8 months ago • 5 comments

Describe the bug

configure.log

pcc025:rank16.abacus: Failed to modify UD QP to INIT on mlx5_0: Operation not permitted pcc025:rank22.abacus: Failed to modify UD QP to INIT on mlx5_0: Operation not permitted

Open MPI failed an OFI Libfabric library call (fi_endpoint). This is highly unusual; your job may behave unpredictably (and/or abort) after this.

Local host: pcc025 Location: mtl_ofi_component.c:515 Error: Invalid argument (22)

Could anyone help me solve these issues?

Thanks. Best regards, Zhao

Expected behavior

No response

To Reproduce

No response

Environment

No response

Additional Context

No response

Task list for Issue attackers (only for developers)

  • [ ] Verify the issue is not a duplicate.
  • [ ] Describe the bug.
  • [ ] Steps to reproduce.
  • [ ] Expected behavior.
  • [x] Error message.
  • [ ] Environment details.
  • [ ] Additional context.
  • [ ] Assign a priority level (low, medium, high, urgent).
  • [ ] Assign the issue to a team member.
  • [ ] Label the issue with relevant tags.
  • [ ] Identify possible related issues.
  • [ ] Create a unit test or automated test to reproduce the bug (if applicable).
  • [ ] Fix the bug.
  • [ ] Test the fix.
  • [ ] Update documentation (if necessary).
  • [ ] Close the issue and inform the reporter (if applicable).

summitmoon avatar May 06 '25 18:05 summitmoon

Hi @summitmoon You can try toolchain 202502, which is merged in the develop branch, and you can just replace the toolchain directory in LTS repo for using. For better debug, please tell us your server environment, like:

  • gcc version
  • OS version
  • toolchain usage Thanks!

QuantumMisaka avatar May 08 '25 03:05 QuantumMisaka

The gcc version is gcc-12.3.0; The toolchain is toolchain_gnu.sh with: ./install_abacus_toolchain.sh
--with-gcc=system
--with-intel=no
--with-openblas=install
--with-openmpi=system
--with-cmake=install
--with-scalapack=install
--with-libxc=install
--with-fftw=install
--with-elpa=install
--with-cereal=install
--with-rapidjson=install
--with-libtorch=no
--with-libnpy=no
--with-libri=no
--with-libcomm=no
--with-4th-openmpi=no \

The openmpi is 4.1.7, and the OS version is NAME="Rocky Linux" VERSION="8.10 (Green Obsidian)".

summitmoon avatar May 09 '25 01:05 summitmoon

@summitmoon There may be some problem in your OpenMPI, please try using --with-openmpi=install

QuantumMisaka avatar May 09 '25 15:05 QuantumMisaka

It is the same problem on the installation of ELPA when I used --with-openmpi=install

summitmoon avatar May 09 '25 20:05 summitmoon

Can you take a screenshot of the error message from ELPA? What environments did you load during installation? @summitmoon

tang070205 avatar Jun 20 '25 10:06 tang070205