DFTK.jl icon indicating copy to clipboard operation
DFTK.jl copied to clipboard

Deadlock when using MPI with Python packages and DFTK

Open simonganne01 opened this issue 9 months ago • 5 comments
trafficstars

Hello everyone,

I'm heaving problems parallelizing DFTK. I run the scripts added to this message. When I run multithreaded vs single-threaded the following is the output:

julia -tauto --project=. run_DFTK.jl
┌ Info: Threading setup: 
│   Threads.nthreads() = 22
│   n_DFTK = 22
│   n_fft = 1
└   n_blas = 22
Multithreading enabled
┌ Warning: Negative ρcore detected: -2.5519188027919356e-8
└ @ DFTK ~/.julia/packages/DFTK/fpkCG/src/terms/xc.jl:39
n     Energy            log10(ΔE)   log10(Δρ)   Diag   Δtime
---   ---------------   ---------   ---------   ----   ------
  1   -157.3406674143                   -0.25    6.2    32.0s
  2   -157.8496486906       -0.29       -0.98    6.1    25.4s
  3   -157.8596014620       -2.00       -1.40    8.2    34.4s
  4   -157.8603026113       -3.15       -2.15    2.0    10.5s
  5   -157.8603796767       -4.11       -2.81    3.9    16.6s
  6   -157.8603823541       -5.57       -3.01    3.3    16.4s
  7   -157.8603833636       -6.00       -3.21    1.6    6.82s
  8   -157.8603838541       -6.31       -3.48    2.1    11.7s
  9   -157.8603838612       -8.15       -3.53    1.9    10.3s
 10   -157.8603838773       -7.79       -3.69    1.1    7.30s
 11   -157.8603838863       -8.05       -3.93    1.8    10.5s
 12   -157.8603838916       -8.28       -4.34    2.0    11.8s
 13   -157.8603838933       -8.76       -5.20    2.9    14.0s
 14   -157.8603838934      -10.04       -5.46    4.2    19.2s
 15   -157.8603838934      -11.38       -5.90    1.6    8.68s
 16   -157.8603838934      -13.07       -6.53    2.8    11.2s
 17   -157.8603838934      -12.55       -6.92    4.0    17.8s
 18   -157.8603838934      -12.70       -7.53    2.2    11.6s
 19   -157.8603838934   +  -12.94       -7.98    3.6    17.3s
 20   -157.8603838934      -12.77       -8.31    3.0    14.4s

for multithreaded and

julia --project=. run_DFTK.jl
┌ Info: Threading setup: 
│   Threads.nthreads() = 1
│   n_DFTK = 1
│   n_fft = 1
└   n_blas = 1
Multithreading enabled
┌ Warning: Negative ρcore detected: -2.5519188008275448e-8
└ @ DFTK ~/.julia/packages/DFTK/fpkCG/src/terms/xc.jl:39
n     Energy            log10(ΔE)   log10(Δρ)   Diag   Δtime
---   ---------------   ---------   ---------   ----   ------
  1   -157.3405698695                   -0.25    6.3    7.14s
  2   -157.8497105405       -0.29       -0.98    5.8    7.89s
  3   -157.8596028516       -2.00       -1.40    8.1    6.89s
  4   -157.8603060979       -3.15       -2.16    2.0    3.75s
  5   -157.8603800028       -4.13       -2.80    4.0    4.67s
  6   -157.8603822929       -5.64       -3.00    3.5    4.79s
  7   -157.8603833368       -5.98       -3.20    1.4    3.34s
  8   -157.8603838472       -6.29       -3.46    2.1    3.36s
  9   -157.8603838637       -7.78       -3.53    2.3    3.54s
 10   -157.8603838781       -7.84       -3.69    1.1    2.84s
 11   -157.8603838878       -8.01       -3.91    1.6    3.25s
 12   -157.8603838924       -8.34       -4.55    2.2    3.54s
 13   -157.8603838933       -9.04       -5.16    3.9    4.63s
 14   -157.8603838934      -10.22       -5.40    3.4    4.63s
 15   -157.8603838934      -10.94       -5.99    2.2    3.93s
 16   -157.8603838934   +  -12.22       -6.18    3.8    4.42s
 17   -157.8603838934   +  -12.43       -6.75    1.2    3.23s
 18   -157.8603838934   +    -Inf       -7.11    3.4    4.16s
 19   -157.8603838934      -13.25       -7.90    2.9    3.86s
 20   -157.8603838934   +  -12.43       -8.11    4.3    5.02s

for singlethreaded. As you can see the singlethreaded one is faster. Any idea what is wrong?

For MPI I disable the multithreading and run it with mpiexecjl but it is very slow on startup. I get this output and after that it seems to be stuck:

/home/sganne/.julia/bin/mpiexecjl -np 4 julia --project=. run_DFTK.jl
┌ Info: CondaPkg: Waiting for lock to be freed. You may delete this file if no other process is resolving.
└   lock_file = "/home/sganne/.julia/environments/v1.11/.CondaPkg/lock"
┌ Info: CondaPkg: Waiting for lock to be freed. You may delete this file if no other process is resolving.
└   lock_file = "/home/sganne/.julia/environments/v1.11/.CondaPkg/lock"
┌ Info: CondaPkg: Waiting for lock to be freed. You may delete this file if no other process is resolving.
└   lock_file = "/home/sganne/.julia/environments/v1.11/.CondaPkg/lock"
┌ Info: CondaPkg: Waiting for lock to be freed. You may delete this file if no other process is resolving.
└   lock_file = "/home/sganne/.julia/environments/v1.11/.CondaPkg/lock"

run_DFTK.jl:

# using MPI
using DFTK
using Unitful
using UnitfulAtomic
using AtomsIO        # Enables only Julia-based parsers
using AtomsIOPython  # Enable python-based parsers as well
using PseudoPotentialData

disable_threading()
DFTK.mpi_master() || (redirect_stdout(); redirect_stderr())


# setup_threading()
# println("Multithreading enabled")

system = load_system("POSCAR")


family_upf = PseudoFamily("dojo.nc.sr.lda.v0_4_1.standard.upf");

pseudopotentials = load_psp(family_upf, system)

temperature = 0.01
smearing = DFTK.Smearing.FermiDirac()


# 2. Select model and basis
# Had to change something with respect to tutorial
model = model_LDA(system, temperature=temperature, smearing=smearing, pseudopotentials=pseudopotentials)
kgrid = [7, 7, 7]     # k-point grid (Regular Monkhorst-Pack grid)
Ecut = 1000.0u"eV"              # kinetic energy cutoff
# n_alg = AdaptiveBands(model, n_bands_converge=20)
basis = PlaneWaveBasis(model; Ecut=Ecut, kgrid=kgrid, use_symmetries_for_kpoint_reduction=true)


# 3. Run the SCF procedure to obtain the ground state
scfres = self_consistent_field(basis, tol=1e-8)
# scfres = DFTK.unfold_bz(scfres)

Thanks in advance for the help!

POSCAR.txt

simonganne01 avatar Feb 24 '25 10:02 simonganne01

For threading that's just the way it is: For small problems threading includes an overhead. Sometimes less threads is faster than more.

Regarding your MPI issue, that looks like a precompilation issue, which can cause MPI-based executions to deadlock. @Technici4n can probably comment whether these symptoms match his experience.

@Technici4n we should probably make a note about this in the parallelisation docs.

mfherbst avatar Feb 24 '25 14:02 mfherbst

For precompilation, you can make sure that all packages are precompiled using --compiled-modules=strict. Other reasonable options might be no or existing.

For multithreading you can try less threads, for example -t2 or -t4 and see if that improves performance.

Technici4n avatar Feb 24 '25 14:02 Technici4n

Ok, let's see if this helps to avoid the deadlock for @simonganne01. I any case this should be documented better.

mfherbst avatar Feb 24 '25 14:02 mfherbst

like this?

/home/sganne/.julia/bin/mpiexecjl -np 4 julia --project=. --compiled-modules=strict run_DFTK.jl

because now I get this:

sganne@UG-FSRPF54:~/VSC/docker/DFTK$ /home/sganne/.julia/bin/mpiexecjl -np 4 julia --project=. --compiled-modules=strict run_DFTK.jl
┌ Info: CondaPkg: Waiting for lock to be freed. You may delete this file if no other process is resolving.
└   lock_file = "/home/sganne/.julia/environments/v1.11/.CondaPkg/lock"
┌ Info: CondaPkg: Waiting for lock to be freed. You may delete this file if no other process is resolving.
└   lock_file = "/home/sganne/.julia/environments/v1.11/.CondaPkg/lock"
┌ Info: CondaPkg: Waiting for lock to be freed. You may delete this file if no other process is resolving.
└   lock_file = "/home/sganne/.julia/environments/v1.11/.CondaPkg/lock"
┌ Info: Threading setup: 
│   Threads.nthreads() = 1
│   n_DFTK = 1
│   n_fft = 1
└   n_blas = 1
┌ Warning: Negative ρcore detected: -2.5519188019003375e-8
└ @ DFTK ~/.julia/packages/DFTK/fpkCG/src/terms/xc.jl:39
┌ Error: The MPI process failed
│   proc = Process(setenv(`/home/sganne/.julia/artifacts/b0621f278163a1b9973d9fc70ac0eec360e8af1e/bin/mpiexec -np 4 julia --compiled-modules=strict run_DFTK.jl`,["OPENBLAS_MAIN_FREE=1", "PATH=/home/sganne/.julia/artifacts/b0621f278163a1b9973d9fc70ac0eec360e8af1e/bin:/home/sganne/.julia/artifacts/b0621f278163a1b9973d9fc70ac0eec360e8af1e/lib/mpich/bin:/home/sganne/.vscode-server/extensions/ms-python.python-2025.0.0-linux-x64/python_files/deactivate/bash:/home/sganne/VSC/Software/lua-5.4.7/src/lua:/home/sganne/VSC/Software:/home/sganne/VSC/PyFoldHub:/home/sganne/VSC:/home/sganne/VSC/Software/lua-5.4.7/src:/home/sganne/VSC/quantum/bin:/opt/bin/:/opt/bin/:/home/sganne/.local/bin:/home/sganne/VSC/Software/lua-5.4.7/src/lua:/home/sganne/VSC/Software:/home/sganne/VSC/PyFoldHub:/home/sganne/VSC:/home/sganne/VSC/Software/lua-5.4.7/src:/home/sganne/.vscode-server/extensions/ms-python.python-2025.0.0-linux-x64/python_files/deactivate/bash:/home/sganne/VSC/Software/lua-5.4.7/src/lua:/home/sganne/VSC/Software:/home/sganne/VSC/PyFoldHub:/home/sganne/VSC:/home/sganne/VSC/Software/lua-5.4.7/src:/home/sganne/VSC/quantum/bin:/home/sganne/.vscode-server/bin/e54c774e0add60467559eb0d1e229c6452cf8447/bin/remote-cli:/home/sganne/.local/bin:/opt/bin/:/opt/bin/:/home/sganne/.local/bin:/home/sganne/VSC/Software/lua-5.4.7/src/lua:/home/sganne/VSC/Software:/home/sganne/VSC/PyFoldHub:/home/sganne/VSC:/home/sganne/VSC/Software/lua-5.4.7/src:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/lib/wsl/lib:/mnt/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.8/bin:/mnt/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.8/libnvvp:/mnt/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.6/bin:/mnt/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.6/libnvvp:/mnt/c/WINDOWS/system32:/mnt/c/WINDOWS:/mnt/c/WINDOWS/System32/Wbem:/mnt/c/WINDOWS/System32/WindowsPowerShell/v1.0/:/mnt/c/WINDOWS/System32/OpenSSH/:/mnt/c/ProgramData/chocolatey/bin:/mnt/c/Program Files/dotnet/:/mnt/c/Program Files/MATLAB/R2024b/bin:/mnt/c/Program Files (x86)/NVIDIA Corporation/PhysX/Common:/mnt/c/Program Files/Git/cmd:/Docker/host/bin:/mnt/c/Program Files/NVIDIA Corporation/Nsight Compute 2025.1.0/:/mnt/c/Program Files/PuTTY/:/mnt/c/Users/sganne/AppData/Local/Programs/Python/Launcher/:/mnt/c/Users/sganne/AppData/Local/Microsoft/WindowsApps:/mnt/c/Users/sganne/AppData/Local/Programs/Microsoft VS Code/bin:/mnt/c/Users/sganne/AppData/Local/Programs/MiKTeX/miktex/bin/x64/:/mnt/c/Users/sganne/AppData/Local/GitHubDesktop/bin:/mnt/c/Users/sganne/AppData/Local/Microsoft/WinGet/Packages/simonmichael.hledger_Microsoft.Winget.Source_8wekyb3d8bbwe:/mnt/c/Users/sganne/AppData/Local/Programs/Ollama:/mnt/c/Users/sganne/.cache/lm-studio/bin:/snap/bin:/usr/local/go/bin:/home/sganne/go/bin:/usr/local/go/bin:/home/sganne/go/bin:/usr/local/go/bin:/home/sganne/go/bin:/usr/local/go/bin:/home/sganne/go/bin", "ESPRESSO_TMPDIR=/tmp", "WSLENV=VSCODE_WSL_EXT_LOCATION/up", "WAYLAND_DISPLAY=wayland-0", "MPITRAMPOLINE_MPIEXEC=/home/sganne/.julia/artifacts/b0621f278163a1b9973d9fc70ac0eec360e8af1e/lib/mpich/bin/mpiexec", "NAME=UG-FSRPF54", "LD_LIBRARY_PATH=/lib/x86_64-linux-gnu:/home/sganne/.julia/artifacts/b0621f278163a1b9973d9fc70ac0eec360e8af1e/lib:/home/sganne/packages/julias/julia-1.11/bin/../lib/julia:/home/sganne/packages/julias/julia-1.11/bin/../lib", "DEBUGPY_ADAPTER_ENDPOINTS=/home/sganne/.vscode-server/extensions/ms-python.debugpy-2025.0.1-linux-x64/.noConfigDebugAdapterEndpoints/endpoint-9a04f020623d20df.txt", "GIT_ASKPASS=/home/sganne/.vscode-server/bin/e54c774e0add60467559eb0d1e229c6452cf8447/extensions/git/dist/askpass.sh"  …  "WSL_INTEROP=/run/WSL/1310_interop", "PS1=\\[\e]633;A\a\\](quantum) \\[\\e]0;\\u@\\h: \\w\\a\\]\${debian_chroot:+(\$debian_chroot)}\\[\\033[01;32m\\]\\u@\\h\\[\\033[00m\\]:\\[\\033[01;34m\\]\\w\\[\\033[00m\\]\\\$ \\[\e]633;B\a\\]", "ESPRESSO_PSEUDO=/home/sganne/QE-2019/pseudo", "ALF_DIR=/home/sganne/VSC/Software/ALF", "HOME=/home/sganne", "TERM=xterm-256color", "LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:", "COLORTERM=truecolor", "VIRTUAL_ENV=/home/sganne/VSC/quantum", "HOSTTYPE=x86_64"]), ProcessExited(1))
└ @ Main none:7

simonganne01 avatar Feb 25 '25 09:02 simonganne01

Yes exactly. --compiled-modules=strict means that importing a package that is not precompiled will throw an error. I think it's probably what you want for optimal performance, but of course you have to remember to precompile. :)

Now it seems you are getting into the fun of trying to figure out what might be wrong with MPI. Without additional logging output it will be difficult. 😅 I would recommend adding print statements at different stages in your code to understand what is failing. (Possibly even the imports).

Technici4n avatar Feb 25 '25 12:02 Technici4n