DFTK.jl
DFTK.jl copied to clipboard
Deadlock when using MPI with Python packages and DFTK
Hello everyone,
I'm heaving problems parallelizing DFTK. I run the scripts added to this message. When I run multithreaded vs single-threaded the following is the output:
julia -tauto --project=. run_DFTK.jl
┌ Info: Threading setup:
│ Threads.nthreads() = 22
│ n_DFTK = 22
│ n_fft = 1
└ n_blas = 22
Multithreading enabled
┌ Warning: Negative ρcore detected: -2.5519188027919356e-8
└ @ DFTK ~/.julia/packages/DFTK/fpkCG/src/terms/xc.jl:39
n Energy log10(ΔE) log10(Δρ) Diag Δtime
--- --------------- --------- --------- ---- ------
1 -157.3406674143 -0.25 6.2 32.0s
2 -157.8496486906 -0.29 -0.98 6.1 25.4s
3 -157.8596014620 -2.00 -1.40 8.2 34.4s
4 -157.8603026113 -3.15 -2.15 2.0 10.5s
5 -157.8603796767 -4.11 -2.81 3.9 16.6s
6 -157.8603823541 -5.57 -3.01 3.3 16.4s
7 -157.8603833636 -6.00 -3.21 1.6 6.82s
8 -157.8603838541 -6.31 -3.48 2.1 11.7s
9 -157.8603838612 -8.15 -3.53 1.9 10.3s
10 -157.8603838773 -7.79 -3.69 1.1 7.30s
11 -157.8603838863 -8.05 -3.93 1.8 10.5s
12 -157.8603838916 -8.28 -4.34 2.0 11.8s
13 -157.8603838933 -8.76 -5.20 2.9 14.0s
14 -157.8603838934 -10.04 -5.46 4.2 19.2s
15 -157.8603838934 -11.38 -5.90 1.6 8.68s
16 -157.8603838934 -13.07 -6.53 2.8 11.2s
17 -157.8603838934 -12.55 -6.92 4.0 17.8s
18 -157.8603838934 -12.70 -7.53 2.2 11.6s
19 -157.8603838934 + -12.94 -7.98 3.6 17.3s
20 -157.8603838934 -12.77 -8.31 3.0 14.4s
for multithreaded and
julia --project=. run_DFTK.jl
┌ Info: Threading setup:
│ Threads.nthreads() = 1
│ n_DFTK = 1
│ n_fft = 1
└ n_blas = 1
Multithreading enabled
┌ Warning: Negative ρcore detected: -2.5519188008275448e-8
└ @ DFTK ~/.julia/packages/DFTK/fpkCG/src/terms/xc.jl:39
n Energy log10(ΔE) log10(Δρ) Diag Δtime
--- --------------- --------- --------- ---- ------
1 -157.3405698695 -0.25 6.3 7.14s
2 -157.8497105405 -0.29 -0.98 5.8 7.89s
3 -157.8596028516 -2.00 -1.40 8.1 6.89s
4 -157.8603060979 -3.15 -2.16 2.0 3.75s
5 -157.8603800028 -4.13 -2.80 4.0 4.67s
6 -157.8603822929 -5.64 -3.00 3.5 4.79s
7 -157.8603833368 -5.98 -3.20 1.4 3.34s
8 -157.8603838472 -6.29 -3.46 2.1 3.36s
9 -157.8603838637 -7.78 -3.53 2.3 3.54s
10 -157.8603838781 -7.84 -3.69 1.1 2.84s
11 -157.8603838878 -8.01 -3.91 1.6 3.25s
12 -157.8603838924 -8.34 -4.55 2.2 3.54s
13 -157.8603838933 -9.04 -5.16 3.9 4.63s
14 -157.8603838934 -10.22 -5.40 3.4 4.63s
15 -157.8603838934 -10.94 -5.99 2.2 3.93s
16 -157.8603838934 + -12.22 -6.18 3.8 4.42s
17 -157.8603838934 + -12.43 -6.75 1.2 3.23s
18 -157.8603838934 + -Inf -7.11 3.4 4.16s
19 -157.8603838934 -13.25 -7.90 2.9 3.86s
20 -157.8603838934 + -12.43 -8.11 4.3 5.02s
for singlethreaded. As you can see the singlethreaded one is faster. Any idea what is wrong?
For MPI I disable the multithreading and run it with mpiexecjl but it is very slow on startup. I get this output and after that it seems to be stuck:
/home/sganne/.julia/bin/mpiexecjl -np 4 julia --project=. run_DFTK.jl
┌ Info: CondaPkg: Waiting for lock to be freed. You may delete this file if no other process is resolving.
└ lock_file = "/home/sganne/.julia/environments/v1.11/.CondaPkg/lock"
┌ Info: CondaPkg: Waiting for lock to be freed. You may delete this file if no other process is resolving.
└ lock_file = "/home/sganne/.julia/environments/v1.11/.CondaPkg/lock"
┌ Info: CondaPkg: Waiting for lock to be freed. You may delete this file if no other process is resolving.
└ lock_file = "/home/sganne/.julia/environments/v1.11/.CondaPkg/lock"
┌ Info: CondaPkg: Waiting for lock to be freed. You may delete this file if no other process is resolving.
└ lock_file = "/home/sganne/.julia/environments/v1.11/.CondaPkg/lock"
run_DFTK.jl:
# using MPI
using DFTK
using Unitful
using UnitfulAtomic
using AtomsIO # Enables only Julia-based parsers
using AtomsIOPython # Enable python-based parsers as well
using PseudoPotentialData
disable_threading()
DFTK.mpi_master() || (redirect_stdout(); redirect_stderr())
# setup_threading()
# println("Multithreading enabled")
system = load_system("POSCAR")
family_upf = PseudoFamily("dojo.nc.sr.lda.v0_4_1.standard.upf");
pseudopotentials = load_psp(family_upf, system)
temperature = 0.01
smearing = DFTK.Smearing.FermiDirac()
# 2. Select model and basis
# Had to change something with respect to tutorial
model = model_LDA(system, temperature=temperature, smearing=smearing, pseudopotentials=pseudopotentials)
kgrid = [7, 7, 7] # k-point grid (Regular Monkhorst-Pack grid)
Ecut = 1000.0u"eV" # kinetic energy cutoff
# n_alg = AdaptiveBands(model, n_bands_converge=20)
basis = PlaneWaveBasis(model; Ecut=Ecut, kgrid=kgrid, use_symmetries_for_kpoint_reduction=true)
# 3. Run the SCF procedure to obtain the ground state
scfres = self_consistent_field(basis, tol=1e-8)
# scfres = DFTK.unfold_bz(scfres)
Thanks in advance for the help!
For threading that's just the way it is: For small problems threading includes an overhead. Sometimes less threads is faster than more.
Regarding your MPI issue, that looks like a precompilation issue, which can cause MPI-based executions to deadlock. @Technici4n can probably comment whether these symptoms match his experience.
@Technici4n we should probably make a note about this in the parallelisation docs.
For precompilation, you can make sure that all packages are precompiled using --compiled-modules=strict. Other reasonable options might be no or existing.
For multithreading you can try less threads, for example -t2 or -t4 and see if that improves performance.
Ok, let's see if this helps to avoid the deadlock for @simonganne01. I any case this should be documented better.
like this?
/home/sganne/.julia/bin/mpiexecjl -np 4 julia --project=. --compiled-modules=strict run_DFTK.jl
because now I get this:
sganne@UG-FSRPF54:~/VSC/docker/DFTK$ /home/sganne/.julia/bin/mpiexecjl -np 4 julia --project=. --compiled-modules=strict run_DFTK.jl
┌ Info: CondaPkg: Waiting for lock to be freed. You may delete this file if no other process is resolving.
└ lock_file = "/home/sganne/.julia/environments/v1.11/.CondaPkg/lock"
┌ Info: CondaPkg: Waiting for lock to be freed. You may delete this file if no other process is resolving.
└ lock_file = "/home/sganne/.julia/environments/v1.11/.CondaPkg/lock"
┌ Info: CondaPkg: Waiting for lock to be freed. You may delete this file if no other process is resolving.
└ lock_file = "/home/sganne/.julia/environments/v1.11/.CondaPkg/lock"
┌ Info: Threading setup:
│ Threads.nthreads() = 1
│ n_DFTK = 1
│ n_fft = 1
└ n_blas = 1
┌ Warning: Negative ρcore detected: -2.5519188019003375e-8
└ @ DFTK ~/.julia/packages/DFTK/fpkCG/src/terms/xc.jl:39
┌ Error: The MPI process failed
│ proc = Process(setenv(`/home/sganne/.julia/artifacts/b0621f278163a1b9973d9fc70ac0eec360e8af1e/bin/mpiexec -np 4 julia --compiled-modules=strict run_DFTK.jl`,["OPENBLAS_MAIN_FREE=1", "PATH=/home/sganne/.julia/artifacts/b0621f278163a1b9973d9fc70ac0eec360e8af1e/bin:/home/sganne/.julia/artifacts/b0621f278163a1b9973d9fc70ac0eec360e8af1e/lib/mpich/bin:/home/sganne/.vscode-server/extensions/ms-python.python-2025.0.0-linux-x64/python_files/deactivate/bash:/home/sganne/VSC/Software/lua-5.4.7/src/lua:/home/sganne/VSC/Software:/home/sganne/VSC/PyFoldHub:/home/sganne/VSC:/home/sganne/VSC/Software/lua-5.4.7/src:/home/sganne/VSC/quantum/bin:/opt/bin/:/opt/bin/:/home/sganne/.local/bin:/home/sganne/VSC/Software/lua-5.4.7/src/lua:/home/sganne/VSC/Software:/home/sganne/VSC/PyFoldHub:/home/sganne/VSC:/home/sganne/VSC/Software/lua-5.4.7/src:/home/sganne/.vscode-server/extensions/ms-python.python-2025.0.0-linux-x64/python_files/deactivate/bash:/home/sganne/VSC/Software/lua-5.4.7/src/lua:/home/sganne/VSC/Software:/home/sganne/VSC/PyFoldHub:/home/sganne/VSC:/home/sganne/VSC/Software/lua-5.4.7/src:/home/sganne/VSC/quantum/bin:/home/sganne/.vscode-server/bin/e54c774e0add60467559eb0d1e229c6452cf8447/bin/remote-cli:/home/sganne/.local/bin:/opt/bin/:/opt/bin/:/home/sganne/.local/bin:/home/sganne/VSC/Software/lua-5.4.7/src/lua:/home/sganne/VSC/Software:/home/sganne/VSC/PyFoldHub:/home/sganne/VSC:/home/sganne/VSC/Software/lua-5.4.7/src:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/lib/wsl/lib:/mnt/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.8/bin:/mnt/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.8/libnvvp:/mnt/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.6/bin:/mnt/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.6/libnvvp:/mnt/c/WINDOWS/system32:/mnt/c/WINDOWS:/mnt/c/WINDOWS/System32/Wbem:/mnt/c/WINDOWS/System32/WindowsPowerShell/v1.0/:/mnt/c/WINDOWS/System32/OpenSSH/:/mnt/c/ProgramData/chocolatey/bin:/mnt/c/Program Files/dotnet/:/mnt/c/Program Files/MATLAB/R2024b/bin:/mnt/c/Program Files (x86)/NVIDIA Corporation/PhysX/Common:/mnt/c/Program Files/Git/cmd:/Docker/host/bin:/mnt/c/Program Files/NVIDIA Corporation/Nsight Compute 2025.1.0/:/mnt/c/Program Files/PuTTY/:/mnt/c/Users/sganne/AppData/Local/Programs/Python/Launcher/:/mnt/c/Users/sganne/AppData/Local/Microsoft/WindowsApps:/mnt/c/Users/sganne/AppData/Local/Programs/Microsoft VS Code/bin:/mnt/c/Users/sganne/AppData/Local/Programs/MiKTeX/miktex/bin/x64/:/mnt/c/Users/sganne/AppData/Local/GitHubDesktop/bin:/mnt/c/Users/sganne/AppData/Local/Microsoft/WinGet/Packages/simonmichael.hledger_Microsoft.Winget.Source_8wekyb3d8bbwe:/mnt/c/Users/sganne/AppData/Local/Programs/Ollama:/mnt/c/Users/sganne/.cache/lm-studio/bin:/snap/bin:/usr/local/go/bin:/home/sganne/go/bin:/usr/local/go/bin:/home/sganne/go/bin:/usr/local/go/bin:/home/sganne/go/bin:/usr/local/go/bin:/home/sganne/go/bin", "ESPRESSO_TMPDIR=/tmp", "WSLENV=VSCODE_WSL_EXT_LOCATION/up", "WAYLAND_DISPLAY=wayland-0", "MPITRAMPOLINE_MPIEXEC=/home/sganne/.julia/artifacts/b0621f278163a1b9973d9fc70ac0eec360e8af1e/lib/mpich/bin/mpiexec", "NAME=UG-FSRPF54", "LD_LIBRARY_PATH=/lib/x86_64-linux-gnu:/home/sganne/.julia/artifacts/b0621f278163a1b9973d9fc70ac0eec360e8af1e/lib:/home/sganne/packages/julias/julia-1.11/bin/../lib/julia:/home/sganne/packages/julias/julia-1.11/bin/../lib", "DEBUGPY_ADAPTER_ENDPOINTS=/home/sganne/.vscode-server/extensions/ms-python.debugpy-2025.0.1-linux-x64/.noConfigDebugAdapterEndpoints/endpoint-9a04f020623d20df.txt", "GIT_ASKPASS=/home/sganne/.vscode-server/bin/e54c774e0add60467559eb0d1e229c6452cf8447/extensions/git/dist/askpass.sh" … "WSL_INTEROP=/run/WSL/1310_interop", "PS1=\\[\e]633;A\a\\](quantum) \\[\\e]0;\\u@\\h: \\w\\a\\]\${debian_chroot:+(\$debian_chroot)}\\[\\033[01;32m\\]\\u@\\h\\[\\033[00m\\]:\\[\\033[01;34m\\]\\w\\[\\033[00m\\]\\\$ \\[\e]633;B\a\\]", "ESPRESSO_PSEUDO=/home/sganne/QE-2019/pseudo", "ALF_DIR=/home/sganne/VSC/Software/ALF", "HOME=/home/sganne", "TERM=xterm-256color", "LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:", "COLORTERM=truecolor", "VIRTUAL_ENV=/home/sganne/VSC/quantum", "HOSTTYPE=x86_64"]), ProcessExited(1))
└ @ Main none:7
Yes exactly. --compiled-modules=strict means that importing a package that is not precompiled will throw an error. I think it's probably what you want for optimal performance, but of course you have to remember to precompile. :)
Now it seems you are getting into the fun of trying to figure out what might be wrong with MPI. Without additional logging output it will be difficult. 😅 I would recommend adding print statements at different stages in your code to understand what is failing. (Possibly even the imports).