xtb
xtb copied to clipboard
Crash with large systems
Describe the bug Systems larger than approx 831-833 atoms always crash. This doesn't seem to depend on what the systems are (tried a few different types of systems, from one long linear molecule to many small ones with different atoms, all behave the same), and also doesn't depend on the coordinates (molecules near each other in different orientations, or very far apart). It also doesn’t seem related to the OpenMP stack size.
To Reproduce Using the provided water278.xyz file: https://gist.github.com/aizvorski/641a987e7dfa89eba4ce241c68409768#file-water278-xyz
$ OMP_NUM_THREADS=1 OMP_MAX_ACTIVE_LEVELS=1 OMP_STACKSIZE=200G time -v /home/ubuntu/bin/xtb-6.5.1/bin/xtb water278.xyz --gfn 2 --chrg "0"
...
* xtb version 6.5.1 (579679a) compiled by 'ehlert@majestix' on 2022-07-11
...
...................................................
: SETUP :
:.................................................:
: # basis functions 1668 :
: # atomic orbitals 1668 :
: # shells 1112 :
: # electrons 2224 :
: max. iterations 250 :
: Hamiltonian GFN2-xTB :
: restarted? false :
: GBSA solvation false :
: PC potential false :
: electronic temp. 300.0000000 K :
: accuracy 1.0000000 :
: -> integral cutoff 0.2500000E+02 :
: -> integral neglect 0.1000000E-07 :
: -> SCF convergence 0.1000000E-05 Eh :
: -> wf. convergence 0.1000000E-03 e :
: Broyden damping 0.4000000 :
...................................................
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
xtb 000000000305452D Unknown Unknown Unknown
xtb 0000000003271BC0 Unknown Unknown Unknown
xtb 000000000099DF21 xtb_disp_coordina 396 coordinationnumber.f90
xtb 00000000031D4B83 Unknown Unknown Unknown
xtb 0000000003186C16 Unknown Unknown Unknown
xtb 0000000003155085 Unknown Unknown Unknown
xtb 000000000099DCA0 xtb_disp_coordina 396 coordinationnumber.f90
xtb 000000000099B2C8 xtb_disp_coordina 340 coordinationnumber.f90
xtb 00000000008E7399 xtb_scf_mp_scf_.A 519 scf_module.F90
xtb 00000000006125A3 xtb_xtb_calculato 257 calculator.f90
xtb 000000000041800F xtb_prog_main_mp_ 580 main.F90
xtb 000000000042512B MAIN__ 55 primary.f90
xtb 00000000004020EE Unknown Unknown Unknown
xtb 0000000003273060 Unknown Unknown Unknown
xtb 0000000000401FD7 Unknown Unknown Unknown
Command exited with non-zero status 174
Command being timed: "/home/ubuntu/bin/xtb-6.5.1/bin/xtb water278.xyz --gfn 2 --chrg 0"
User time (seconds): 0.15
System time (seconds): 0.03
Percent of CPU this job got: 97%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.19
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 108560
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 28220
Voluntary context switches: 1
Involuntary context switches: 449
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 174
For comparison, an input file water277.xyz with one less water succeeds: https://gist.github.com/aizvorski/7b4215388491126090ba83b6ae4ab341#file-water277-xyz
$ OMP_NUM_THREADS=1 OMP_MAX_ACTIVE_LEVELS=1 OMP_STACKSIZE=200G time -v /home/ubuntu/bin/xtb-6.5.1/bin/xtb water277.xyz --gfn 2 --chrg "0"
...
* xtb version 6.5.1 (579679a) compiled by 'ehlert@majestix' on 2022-07-11
...
...................................................
: SETUP :
:.................................................:
: # basis functions 1662 :
: # atomic orbitals 1662 :
: # shells 1108 :
: # electrons 2216 :
: max. iterations 250 :
: Hamiltonian GFN2-xTB :
: restarted? true :
: GBSA solvation false :
: PC potential false :
: electronic temp. 300.0000000 K :
: accuracy 1.0000000 :
: -> integral cutoff 0.2500000E+02 :
: -> integral neglect 0.1000000E-07 :
: -> SCF convergence 0.1000000E-05 Eh :
: -> wf. convergence 0.1000000E-03 e :
: Broyden damping 0.4000000 :
...................................................
iter E dE RMSdq gap omega full diag
1 -1415.6386943 -0.141564E+04 0.204E-07 8.73 0.0 T
2 -1415.6386943 0.886757E-11 0.119E-07 8.73 29040.2 T
3 -1415.6386943 -0.106866E-10 0.207E-08 8.73 100000.0 T
*** convergence criteria satisfied after 3 iterations ***
# Occupation Energy/Eh Energy/eV
-------------------------------------------------------------
1 2.0000 -0.7271272 -19.7861
... ... ... ...
1102 2.0000 -0.3682050 -10.0194
1103 2.0000 -0.3664023 -9.9703
1104 2.0000 -0.3625255 -9.8648
1105 2.0000 -0.3584824 -9.7548
1106 2.0000 -0.3570151 -9.7149
1107 2.0000 -0.3556497 -9.6777
1108 2.0000 -0.3359206 -9.1409 (HOMO)
1109 -0.0151621 -0.4126 (LUMO)
1110 -0.0061251 -0.1667
1111 0.0011029 0.0300
1112 0.0020212 0.0550
1113 0.0029399 0.0800
... ... ...
1662 0.4675880 12.7237
-------------------------------------------------------------
HL-Gap 0.3207585 Eh 8.7283 eV
Fermi-level -0.1755413 Eh -4.7767 eV
SCC (total) 0 d, 0 h, 0 min, 17.350 sec
SCC setup ... 0 min, 0.037 sec ( 0.211%)
Dispersion ... 0 min, 0.080 sec ( 0.462%)
classical contributions ... 0 min, 0.011 sec ( 0.063%)
integral evaluation ... 0 min, 0.634 sec ( 3.651%)
iterations ... 0 min, 11.684 sec ( 67.342%)
molecular gradient ... 0 min, 4.016 sec ( 23.145%)
printout ... 0 min, 0.889 sec ( 5.125%)
:::::::::::::::::::::::::::::::::::::::::::::::::::::
:: SUMMARY ::
:::::::::::::::::::::::::::::::::::::::::::::::::::::
:: total energy -1405.892124104588 Eh ::
:: gradient norm 0.203225946340 Eh/a0 ::
:: HOMO-LUMO gap 8.728283439762 eV ::
::.................................................::
:: SCC energy -1415.638694336316 Eh ::
:: -> isotropic ES 8.569566870483 Eh ::
:: -> anisotropic ES -0.289563022977 Eh ::
:: -> anisotropic XC -0.213130853940 Eh ::
:: -> dispersion -0.253146647874 Eh ::
:: repulsion energy 9.734657198491 Eh ::
:: add. restraining 0.000000000000 Eh ::
:: total charge -0.000000000003 e ::
:::::::::::::::::::::::::::::::::::::::::::::::::::::
...
-------------------------------------------------
| TOTAL ENERGY -1405.892124104588 Eh |
| GRADIENT NORM 0.203225946340 Eh/α |
| HOMO-LUMO GAP 8.728283439762 eV |
-------------------------------------------------
------------------------------------------------------------------------
* finished run on 2022/10/02 at 00:43:41.395
------------------------------------------------------------------------
total:
* wall-time: 0 d, 0 h, 0 min, 18.069 sec
* cpu-time: 0 d, 0 h, 0 min, 18.065 sec
* ratio c/w: 1.000 speedup
SCF:
* wall-time: 0 d, 0 h, 0 min, 17.377 sec
* cpu-time: 0 d, 0 h, 0 min, 17.376 sec
* ratio c/w: 1.000 speedup
normal termination of xtb
Command being timed: "/home/ubuntu/bin/xtb-6.5.1/bin/xtb water277.xyz --gfn 2 --chrg 0"
User time (seconds): 17.69
System time (seconds): 0.37
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:18.07
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 594568
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 228439
Voluntary context switches: 1
Involuntary context switches: 483
Swaps: 0
File system inputs: 0
File system outputs: 368
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
This does not appear to be due to out-of-memory, or to too-low setting for OMP_STACKSIZE. The machine this was tested on has >200GB memory. The actual memory used when the crash happens (reported by time -v) is just a little over 100MB.
Setting the stack size deliberately very low with largest input system which succeeds, water277.xyz:
-
OMP_STACKSIZE=1M OMP_NUM_THREADS=1
succeeds - the stack size seems to not matter when there is only one thread -
OMP_STACKSIZE=50M OMP_NUM_THREADS=2
succeeds -
OMP_STACKSIZE=20M OMP_NUM_THREADS=2
fails, the exact failure seems non-deterministic - either SIGSEGV in xtb_coulomb_klopm during the iterations, or "Command terminated by signal 11" after iterations finish
GDB backtrace:
$ OMP_NUM_THREADS=1 OMP_MAX_ACTIVE_LEVELS=1 OMP_STACKSIZE=200G gdb /home/ubuntu/bin/xtb-6.5.0/bin/xtb
(gdb) run water278.xyz --gfn 1 --chrg "0"
Program received signal SIGSEGV, Segmentation fault.
0x000000000099cf41 in xtb_disp_coordinationnumber_mp_ncoordlatp_.A ()
(gdb) bt
#0 0x000000000099cf41 in xtb_disp_coordinationnumber_mp_ncoordlatp_.A ()
#1 0x00000000031d3f83 in __kmp_invoke_microtask ()
#2 0x0000000003186016 in __kmp_fork_call ()
#3 0x0000000003154485 in __kmpc_fork_call ()
#4 0x000000000099ccc0 in xtb_disp_coordinationnumber_mp_ncoordlatp_.A ()
#5 0x000000000099a438 in xtb_disp_coordinationnumber_mp_getcoordinationnumberlp_ ()
#6 0x00000000008e6429 in xtb_scf_mp_scf_.A ()
#7 0x0000000000611d33 in xtb_xtb_calculator_mp_singlepoint_.A ()
#8 0x00000000004177f3 in xtb_prog_main_mp_xtbmain_.A ()
#9 0x000000000042492b in MAIN__ ()
Expected behaviour No crash.
Additional context
Using xtb 6.5.1 binary downloaded from https://github.com/grimme-lab/xtb/releases/download/v6.5.1/xtb-6.5.1-linux-x86_64.tar.xz
xtb --version gives version 6.5.1 (579679a) compiled by 'ehlert@majestix' on 2022-07-11
OS: Ubuntu 18.04.4 LTS Hardware: AMD EPYC 7B13 CPU, 224GB RAM (also tested on Ubuntu 20.04 LTS, Intel i7-10510U, 48GB RAM: same behavior) (also tested on xtb-6.5.0 and 6.4.1: same)
Update: the exact number of atoms which causes the crash is 834. The number of orbitals doesn't seem to matter, it really is atoms.
Works: 833 helium atoms https://gist.github.com/aizvorski/a6616970339d8447a98989b4d0455db8#file-helium833-xyz
Crashes: 834 helium atoms https://gist.github.com/aizvorski/b7b65913c1a52379937afc76b38c3450#file-helium834-xyz
This works fine for me, once I set 'ulimit -s unlimited' and 'export OMP_STACKSIZE=4G' xtb he834.xyz --namespcae test
* xtb version 6.5.1 (579679a) compiled by 'ehlert@majestix' on 2022-07-11
...
------------------------------------------------------------------------
* finished run on 2022/10/04 at 09:11:23.662
------------------------------------------------------------------------
total:
* wall-time: 0 d, 0 h, 0 min, 12.147 sec
* cpu-time: 0 d, 0 h, 0 min, 58.211 sec
* ratio c/w: 4.792 speedup
SCF:
* wall-time: 0 d, 0 h, 0 min, 11.557 sec
* cpu-time: 0 d, 0 h, 0 min, 55.357 sec
* ratio c/w: 4.790 speedup
normal termination of xtb
@haneug I can confirm this, the process stack in ulimit -s was the limiting factor. ulimit -s unlimited
works.
I think it's fair to say any SIGSEGV crash is a bug, since it is impossible to distinguish it from other bugs like out of bounds pointer, and there is no indication to the user what it is necessary to do to make the calculation succeed.
Since this is likely to be a thing a lot of folks run into, I'm going to suggest one of two things:
- Figure out how much process stack and OMP stack would be needed, and if there isn't enough exit (without crashing) with an appropriate error message describing how much memory would be needed to complete the calculation, or
- Default to really high limits (process stack set to unlimited or equal to hard limit, OMP stack set to system memory/number of threads)
While educating the users on this setting seems error prone there are not really much alternatives, or better put, not many universal solutions. A simple band-aid solution could be a shell wrapper around xtb
which sets those values by default.
Back to the problem. So far I found a solution for MacOS (using -Wl,-stacksize,0x1000000
) and Windows (using /STACK:16777216
).
On Linux we have the possibility to use a system call getrlimit(2)
/ setrlimit(2)
to retrieve the current stack limit and warn the user if it not sufficient (note that system call here does not refer to Fortran's call system
but usage of a function from the Linux kernel). I don't know whether setrlimit(2)
is sufficient to increase the stack size at runtime, this sounds like something a process should not be allowed to do without elevated permissions, but maybe worth a try.
The OpenMP stack size issue is more severe, so far I found no good way to detect a too small stack. However, I believe this is a problem that can be solved on the algorithm side, for example I could restructure most OpenMP regions in s-dftd3
to not put large arrays on the OpenMP stack, which almost completely eliminates issues with stack overflows on both the system or OpenMP stack. Might be a way for xtb
as well. The implementation however gets somewhat more verbose about memory allocations.
Regarding stack usage, there is many insightful discussions on the use of stack vs. heap arrays in the Fortran discourse:
- https://fortran-lang.discourse.group/t/openmp-question-private-vs-shared-work-arrays-for-reduction/746
- https://fortran-lang.discourse.group/t/automatic-vs-allocatable-arrays-for-function-results/1741
- https://fortran-lang.discourse.group/t/why-stack-is-faster-than-heap-and-what-exactly-is-stack/2130
- https://fortran-lang.discourse.group/t/frecursive-vs-fmax-stack-var-size-vs-unlimit-s/2970
- https://fortran-lang.discourse.group/t/automatic-arrays-and-intrinsic-array-operations-to-use-or-not-to-use/4070
That issue actually comes up a lot, not only in xtb
. The only surefire method so far seems to avoid putting any large arrays on any stack but rather do the heap allocation explicitly.
@awvwgk Thanks, that's a good collection of links! I don't know too much about Fortran specifically, but perhaps using some compiler feature to avoid large arrays on the stack (without having to modify code) might work.
What compiler are release xtb binaries compiled with now?
It looks like gfortran doesn't yet have any way of doing this, but Intel ifx -heap-arrays [size]
(docs) and NVIDIA/PGI nvfortran -Mnostack_arrays
(docs) might do the job.
(Bonus: ifx and nvfortran can both compile OpenMP code to run on GPU :)
@awvwgk About OMP_STACKSIZE: the compiler options to reduce stack use may also apply to OpenMP code, but if not, maybe we could default to OMP_STACKSIZE=physical memory/number of threads? That's only if OMP_STACKSIZE environment variable is unset of course; if it is set, then use the value and maybe warn if it is low.
It looks like gfortran doesn't yet have any way of doing this, but Intel
ifx -heap-arrays [size]
(docs) and NVIDIA/PGInvfortran -Mnostack_arrays
(docs) might do the job.
Those apply to automatic arrays. Since we don't use automatic arrays in xtb
, the option to put them on the heap will not change the program behavior. Maybe providing a custom allocator in the OpenMP directive might do the trick.
(Bonus: ifx and nvfortran can both compile OpenMP code to run on GPU :)
I'm really looking forward to see the first LLVM based Fortran compiler working for a code base using moderately new Fortran features (F2003+).
I found how to fix this bug for Windows.
- Install MVSC
- Use
Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.35.32215\bin\Hostx64\x64\editbin.exe
to patch xtb.exeeditbin.exe /STACK:64000000 xtb.exe