Out of memory with DLPNO-CCSD(T)
I'm trying to test out the implementation of DLPNO-CCSD(T) in the main branch (revision 6053f9a checked out today). I find that no matter how much memory I try to give it, I still get an error that there wasn't enough.
Here's my input file, which just computes the energy of a single water molecule.
memory 32 GB
molecule h2o {
O
H 1 0.96
H 1 0.96 2 104.5
}
set basis cc-pVDZ
energy('dlpno-ccsd(t)')
It fails with the message, "Fatal Error: Too little memory given for DLPNO-CCSD Algorithm!"
Looking through the output I find its estimates of the required memory.
==> DLPNO-CCSD Memory Requirements <==
*** Common Quantities ***
(q | i j) [AUX, LMO] : 0.000 [GB]
(q | i a) [AUX, LMO, PAO] : 0.000 [GB]
(q | a b) [AUX, PAO] : 0.000 [GB]
(k_{ij}, l_{ij})-like : 844425.371 [GB]
(k_{ij}, c_{ij})-like : 144321447940.430 [GB]
(a_{ij}, b_{ij})-like : 48.727 [GB]
(a_{ij}, c_{kj})-like : 844425.347 [GB]
(i a_{ij} | b_{ij}, c_{ij}) : 0.001 [GB]
(Q_{ij} | k_{ij} i) : 48.727 [GB]
(Q_{ij} | a_{ij} i) : 48.727 [GB]
(Q_{ij} | m_{ij} a_{ij}) : 0.001 [GB]
(Q_{ij} | a_{ij} b_{ij}) : 0.003 [GB]
Some of those numbers look suspiciously large for a single water molecule. :)
Looks like integer under/overflow on first glance
@andyj10224, can you take a look? Do defaults need to be set differently?
@peastman, sorry about that. Perhaps try working from https://github.com/psi4/psi4/blob/master/tests/dlpnocc-1/input.dat which is also cc-pvdz water. That runs in CI so <4gb memory. In particular, set freeze_core true is useful for the method and for resources.
freeze_core doesn't help, but this line in that file does:
set dlpno_toggle_memory false
If I add that line, it runs without problem.
Here is a bigger molecule with 41 atoms. This one segfaults.
memory 32 GB
molecule mol {
0 1
C -6.490999517882142 2.537862684063124 -0.2323102727995172
C -6.328573442396725 1.0587890760537342 -0.5217635401567717
C -5.075936167154976 0.6061573086614495 0.2350438796872525
C -4.8126633715875 -0.855500983294261 0.022371668031886085
N -3.662814391362749 -1.3722394084528953 0.6919258810947014
C -2.3317025039904817 -1.008380205827574 0.3368257968229602
S -2.0928918146868747 0.05761004239136857 -0.9006178435072472
N -1.184069382531426 -1.5242034114566212 1.0056586198187671
N 0.1301397890615767 -1.1444494802483816 0.6298804433873023
C 1.0334156775993977 -2.059394067441253 0.3947429578112244
C 2.3979452885925436 -1.7192216664803903 0.005733845845878158
C 3.362163643014409 -2.694365164834639 -0.24484274153182137
C 4.6460523688931445 -2.3440664470958397 -0.6086692579058807
C 4.955294312738983 -1.0027505374123884 -0.7191672280909117
C 4.0404835194357105 -0.008533656047057721 -0.48264863774515454
O 4.278507737083588 1.340756056679458 -0.5735827354768532
C 5.448939325538495 2.0043760963604695 -0.9014069180985177
C 6.571384631981316 1.7070091411274624 0.060412348859605385
C 2.748601901677971 -0.39190320919379934 -0.11616578471961819
O 1.7613363013430023 0.5717983766243413 0.14061113978185472
H -6.784818302912347 3.054028629984417 -1.177587086255804
H -5.546612707317883 2.981106576785527 0.14935118892052024
H -7.235420448687445 2.6917139957798732 0.5890051496248345
H -7.203878515652723 0.4593411348145616 -0.24061867739794673
H -6.062032389992824 0.9578860340497859 -1.5943941606126875
H -4.248953516145324 1.2355161502200693 -0.14795370698963758
H -5.327584241816008 0.8015737490824821 1.3083678055164396
H -5.711654823428578 -1.3980230080522968 0.40075011721575277
H -4.773077391252315 -1.0785676082193396 -1.0639539127489945
H -3.755100507678116 -2.062042774437259 1.4975466605061887
H -1.3312377202654841 -2.2069110420225564 1.8012501991371148
H 0.7470422223506362 -3.0971265263866923 0.49539548002989314
H 3.0642107824783267 -3.728318512038958 -0.14336534028044853
H 5.378363270068862 -3.1115213894781513 -0.7987604430064659
H 5.950156173187689 -0.7449726551054592 -1.0031316116672393
H 5.222328098077193 3.1080846481568773 -0.8043940639696501
H 5.806840582857794 1.8078396284984524 -1.9351745579309316
H 6.21140068671906 1.2934961778876564 1.026324447356718
H 7.064232458787285 2.6813209600804404 0.3381581122594744
H 7.346910932505775 1.0825383856344617 -0.43367759670604006
H 1.7942714527493095 1.5136869005900124 -0.20996397628098845
}
set basis cc-pVDZ
set dlpno_toggle_memory false
energy('dlpno-ccsd(t)')
@peastman Can you try setting symmetry c1 in the molecular input string (like in the test)? I'd also like to know what operating system you are running on
Looks like integer under/overflow on first glance
I agree; the code appears to be using ints, and when you multiply two ints the product might be too bit to fit into an int.
Here is a bigger molecule with 41 atoms. This one segfaults.
memory 32 GB molecule mol { 0 1 C -6.490999517882142 2.537862684063124 -0.2323102727995172 C -6.328573442396725 1.0587890760537342 -0.5217635401567717 C -5.075936167154976 0.6061573086614495 0.2350438796872525 C -4.8126633715875 -0.855500983294261 0.022371668031886085 N -3.662814391362749 -1.3722394084528953 0.6919258810947014 C -2.3317025039904817 -1.008380205827574 0.3368257968229602 S -2.0928918146868747 0.05761004239136857 -0.9006178435072472 N -1.184069382531426 -1.5242034114566212 1.0056586198187671 N 0.1301397890615767 -1.1444494802483816 0.6298804433873023 C 1.0334156775993977 -2.059394067441253 0.3947429578112244 C 2.3979452885925436 -1.7192216664803903 0.005733845845878158 C 3.362163643014409 -2.694365164834639 -0.24484274153182137 C 4.6460523688931445 -2.3440664470958397 -0.6086692579058807 C 4.955294312738983 -1.0027505374123884 -0.7191672280909117 C 4.0404835194357105 -0.008533656047057721 -0.48264863774515454 O 4.278507737083588 1.340756056679458 -0.5735827354768532 C 5.448939325538495 2.0043760963604695 -0.9014069180985177 C 6.571384631981316 1.7070091411274624 0.060412348859605385 C 2.748601901677971 -0.39190320919379934 -0.11616578471961819 O 1.7613363013430023 0.5717983766243413 0.14061113978185472 H -6.784818302912347 3.054028629984417 -1.177587086255804 H -5.546612707317883 2.981106576785527 0.14935118892052024 H -7.235420448687445 2.6917139957798732 0.5890051496248345 H -7.203878515652723 0.4593411348145616 -0.24061867739794673 H -6.062032389992824 0.9578860340497859 -1.5943941606126875 H -4.248953516145324 1.2355161502200693 -0.14795370698963758 H -5.327584241816008 0.8015737490824821 1.3083678055164396 H -5.711654823428578 -1.3980230080522968 0.40075011721575277 H -4.773077391252315 -1.0785676082193396 -1.0639539127489945 H -3.755100507678116 -2.062042774437259 1.4975466605061887 H -1.3312377202654841 -2.2069110420225564 1.8012501991371148 H 0.7470422223506362 -3.0971265263866923 0.49539548002989314 H 3.0642107824783267 -3.728318512038958 -0.14336534028044853 H 5.378363270068862 -3.1115213894781513 -0.7987604430064659 H 5.950156173187689 -0.7449726551054592 -1.0031316116672393 H 5.222328098077193 3.1080846481568773 -0.8043940639696501 H 5.806840582857794 1.8078396284984524 -1.9351745579309316 H 6.21140068671906 1.2934961778876564 1.026324447356718 H 7.064232458787285 2.6813209600804404 0.3381581122594744 H 7.346910932505775 1.0825383856344617 -0.43367759670604006 H 1.7942714527493095 1.5136869005900124 -0.20996397628098845 } set basis cc-pVDZ set dlpno_toggle_memory false energy('dlpno-ccsd(t)')
This appears to be an issue with non-frozen core DLPNO-CCSD(T)! The segfault disappears if you run it with
set freeze_core true I will look into it!
I'm trying to test out the implementation of DLPNO-CCSD(T) in the main branch (revision 6053f9a checked out today). I find that no matter how much memory I try to give it, I still get an error that there wasn't enough.
Here's my input file, which just computes the energy of a single water molecule.
memory 32 GB molecule h2o { O H 1 0.96 H 1 0.96 2 104.5 } set basis cc-pVDZ energy('dlpno-ccsd(t)')It fails with the message, "Fatal Error: Too little memory given for DLPNO-CCSD Algorithm!"
Looking through the output I find its estimates of the required memory.
==> DLPNO-CCSD Memory Requirements <== *** Common Quantities *** (q | i j) [AUX, LMO] : 0.000 [GB] (q | i a) [AUX, LMO, PAO] : 0.000 [GB] (q | a b) [AUX, PAO] : 0.000 [GB] (k_{ij}, l_{ij})-like : 844425.371 [GB] (k_{ij}, c_{ij})-like : 144321447940.430 [GB] (a_{ij}, b_{ij})-like : 48.727 [GB] (a_{ij}, c_{kj})-like : 844425.347 [GB] (i a_{ij} | b_{ij}, c_{ij}) : 0.001 [GB] (Q_{ij} | k_{ij} i) : 48.727 [GB] (Q_{ij} | a_{ij} i) : 48.727 [GB] (Q_{ij} | m_{ij} a_{ij}) : 0.001 [GB] (Q_{ij} | a_{ij} b_{ij}) : 0.003 [GB]Some of those numbers look suspiciously large for a single water molecule. :)
I think I discovered the cause of this… when I am computing memory, I forgot to initialize my counters to zero before incrementing them
https://github.com/psi4/psi4/blob/master/psi4/src/psi4/dlpno/ccsd.cc#L269
Here is a bigger molecule with 41 atoms. This one segfaults.
memory 32 GB molecule mol { 0 1 C -6.490999517882142 2.537862684063124 -0.2323102727995172 C -6.328573442396725 1.0587890760537342 -0.5217635401567717 C -5.075936167154976 0.6061573086614495 0.2350438796872525 C -4.8126633715875 -0.855500983294261 0.022371668031886085 N -3.662814391362749 -1.3722394084528953 0.6919258810947014 C -2.3317025039904817 -1.008380205827574 0.3368257968229602 S -2.0928918146868747 0.05761004239136857 -0.9006178435072472 N -1.184069382531426 -1.5242034114566212 1.0056586198187671 N 0.1301397890615767 -1.1444494802483816 0.6298804433873023 C 1.0334156775993977 -2.059394067441253 0.3947429578112244 C 2.3979452885925436 -1.7192216664803903 0.005733845845878158 C 3.362163643014409 -2.694365164834639 -0.24484274153182137 C 4.6460523688931445 -2.3440664470958397 -0.6086692579058807 C 4.955294312738983 -1.0027505374123884 -0.7191672280909117 C 4.0404835194357105 -0.008533656047057721 -0.48264863774515454 O 4.278507737083588 1.340756056679458 -0.5735827354768532 C 5.448939325538495 2.0043760963604695 -0.9014069180985177 C 6.571384631981316 1.7070091411274624 0.060412348859605385 C 2.748601901677971 -0.39190320919379934 -0.11616578471961819 O 1.7613363013430023 0.5717983766243413 0.14061113978185472 H -6.784818302912347 3.054028629984417 -1.177587086255804 H -5.546612707317883 2.981106576785527 0.14935118892052024 H -7.235420448687445 2.6917139957798732 0.5890051496248345 H -7.203878515652723 0.4593411348145616 -0.24061867739794673 H -6.062032389992824 0.9578860340497859 -1.5943941606126875 H -4.248953516145324 1.2355161502200693 -0.14795370698963758 H -5.327584241816008 0.8015737490824821 1.3083678055164396 H -5.711654823428578 -1.3980230080522968 0.40075011721575277 H -4.773077391252315 -1.0785676082193396 -1.0639539127489945 H -3.755100507678116 -2.062042774437259 1.4975466605061887 H -1.3312377202654841 -2.2069110420225564 1.8012501991371148 H 0.7470422223506362 -3.0971265263866923 0.49539548002989314 H 3.0642107824783267 -3.728318512038958 -0.14336534028044853 H 5.378363270068862 -3.1115213894781513 -0.7987604430064659 H 5.950156173187689 -0.7449726551054592 -1.0031316116672393 H 5.222328098077193 3.1080846481568773 -0.8043940639696501 H 5.806840582857794 1.8078396284984524 -1.9351745579309316 H 6.21140068671906 1.2934961778876564 1.026324447356718 H 7.064232458787285 2.6813209600804404 0.3381581122594744 H 7.346910932505775 1.0825383856344617 -0.43367759670604006 H 1.7942714527493095 1.5136869005900124 -0.20996397628098845 } set basis cc-pVDZ set dlpno_toggle_memory false energy('dlpno-ccsd(t)')This appears to be an issue with non-frozen core DLPNO-CCSD(T)! The segfault disappears if you run it with
set freeze_core trueI will look into it!
I also figured out the reason for the segfault! I opened a pull request #3338. I will close this issue when that is merged!
Both problems are now fixed for me. Thank you!
I've run into another problem, but I'll open a separate issue for that.