psi4 icon indicating copy to clipboard operation
psi4 copied to clipboard

Out of memory with DLPNO-CCSD(T)

Open peastman opened this issue 2 weeks ago • 9 comments

I'm trying to test out the implementation of DLPNO-CCSD(T) in the main branch (revision 6053f9a checked out today). I find that no matter how much memory I try to give it, I still get an error that there wasn't enough.

Here's my input file, which just computes the energy of a single water molecule.

memory 32 GB

molecule h2o {
  O 
  H 1 0.96
  H 1 0.96 2 104.5
}

set basis cc-pVDZ
energy('dlpno-ccsd(t)')

It fails with the message, "Fatal Error: Too little memory given for DLPNO-CCSD Algorithm!"

Looking through the output I find its estimates of the required memory.

  ==> DLPNO-CCSD Memory Requirements <== 

    *** Common Quantities ***
    (q | i j) [AUX, LMO]          :    0.000 [GB]
    (q | i a) [AUX, LMO, PAO]     :    0.000 [GB]
    (q | a b) [AUX, PAO]          :    0.000 [GB]
    (k_{ij}, l_{ij})-like         : 844425.371 [GB]
    (k_{ij}, c_{ij})-like         : 144321447940.430 [GB]
    (a_{ij}, b_{ij})-like         :   48.727 [GB]
    (a_{ij}, c_{kj})-like         : 844425.347 [GB]
    (i a_{ij} | b_{ij}, c_{ij})   :    0.001 [GB]
    (Q_{ij} | k_{ij} i)           :   48.727 [GB]
    (Q_{ij} | a_{ij} i)           :   48.727 [GB]
    (Q_{ij} | m_{ij} a_{ij})      :    0.001 [GB]
    (Q_{ij} | a_{ij} b_{ij})      :    0.003 [GB]

Some of those numbers look suspiciously large for a single water molecule. :)

peastman avatar Dec 12 '25 22:12 peastman

Looks like integer under/overflow on first glance

TiborGY avatar Dec 12 '25 22:12 TiborGY

@andyj10224, can you take a look? Do defaults need to be set differently?

@peastman, sorry about that. Perhaps try working from https://github.com/psi4/psi4/blob/master/tests/dlpnocc-1/input.dat which is also cc-pvdz water. That runs in CI so <4gb memory. In particular, set freeze_core true is useful for the method and for resources.

loriab avatar Dec 12 '25 23:12 loriab

freeze_core doesn't help, but this line in that file does:

set dlpno_toggle_memory false

If I add that line, it runs without problem.

peastman avatar Dec 12 '25 23:12 peastman

Here is a bigger molecule with 41 atoms. This one segfaults.

memory 32 GB

molecule mol {
0 1
C    -6.490999517882142    2.537862684063124  -0.2323102727995172
C    -6.328573442396725    1.0587890760537342  -0.5217635401567717
C    -5.075936167154976    0.6061573086614495  0.2350438796872525
C    -4.8126633715875    -0.855500983294261  0.022371668031886085
N    -3.662814391362749    -1.3722394084528953  0.6919258810947014
C    -2.3317025039904817    -1.008380205827574  0.3368257968229602
S    -2.0928918146868747    0.05761004239136857  -0.9006178435072472
N    -1.184069382531426    -1.5242034114566212  1.0056586198187671
N    0.1301397890615767    -1.1444494802483816  0.6298804433873023
C    1.0334156775993977    -2.059394067441253  0.3947429578112244
C    2.3979452885925436    -1.7192216664803903  0.005733845845878158
C    3.362163643014409    -2.694365164834639  -0.24484274153182137
C    4.6460523688931445    -2.3440664470958397  -0.6086692579058807
C    4.955294312738983    -1.0027505374123884  -0.7191672280909117
C    4.0404835194357105    -0.008533656047057721  -0.48264863774515454
O    4.278507737083588    1.340756056679458  -0.5735827354768532
C    5.448939325538495    2.0043760963604695  -0.9014069180985177
C    6.571384631981316    1.7070091411274624  0.060412348859605385
C    2.748601901677971    -0.39190320919379934  -0.11616578471961819
O    1.7613363013430023    0.5717983766243413  0.14061113978185472
H    -6.784818302912347    3.054028629984417  -1.177587086255804
H    -5.546612707317883    2.981106576785527  0.14935118892052024
H    -7.235420448687445    2.6917139957798732  0.5890051496248345
H    -7.203878515652723    0.4593411348145616  -0.24061867739794673
H    -6.062032389992824    0.9578860340497859  -1.5943941606126875
H    -4.248953516145324    1.2355161502200693  -0.14795370698963758
H    -5.327584241816008    0.8015737490824821  1.3083678055164396
H    -5.711654823428578    -1.3980230080522968  0.40075011721575277
H    -4.773077391252315    -1.0785676082193396  -1.0639539127489945
H    -3.755100507678116    -2.062042774437259  1.4975466605061887
H    -1.3312377202654841    -2.2069110420225564  1.8012501991371148
H    0.7470422223506362    -3.0971265263866923  0.49539548002989314
H    3.0642107824783267    -3.728318512038958  -0.14336534028044853
H    5.378363270068862    -3.1115213894781513  -0.7987604430064659
H    5.950156173187689    -0.7449726551054592  -1.0031316116672393
H    5.222328098077193    3.1080846481568773  -0.8043940639696501
H    5.806840582857794    1.8078396284984524  -1.9351745579309316
H    6.21140068671906    1.2934961778876564  1.026324447356718
H    7.064232458787285    2.6813209600804404  0.3381581122594744
H    7.346910932505775    1.0825383856344617  -0.43367759670604006
H    1.7942714527493095    1.5136869005900124  -0.20996397628098845
}

set basis cc-pVDZ
set dlpno_toggle_memory false
energy('dlpno-ccsd(t)')

peastman avatar Dec 13 '25 00:12 peastman

@peastman Can you try setting symmetry c1 in the molecular input string (like in the test)? I'd also like to know what operating system you are running on

andyj10224 avatar Dec 13 '25 09:12 andyj10224

Looks like integer under/overflow on first glance

I agree; the code appears to be using ints, and when you multiply two ints the product might be too bit to fit into an int.

susilehtola avatar Dec 13 '25 11:12 susilehtola

Here is a bigger molecule with 41 atoms. This one segfaults.

memory 32 GB

molecule mol {
0 1
C    -6.490999517882142    2.537862684063124  -0.2323102727995172
C    -6.328573442396725    1.0587890760537342  -0.5217635401567717
C    -5.075936167154976    0.6061573086614495  0.2350438796872525
C    -4.8126633715875    -0.855500983294261  0.022371668031886085
N    -3.662814391362749    -1.3722394084528953  0.6919258810947014
C    -2.3317025039904817    -1.008380205827574  0.3368257968229602
S    -2.0928918146868747    0.05761004239136857  -0.9006178435072472
N    -1.184069382531426    -1.5242034114566212  1.0056586198187671
N    0.1301397890615767    -1.1444494802483816  0.6298804433873023
C    1.0334156775993977    -2.059394067441253  0.3947429578112244
C    2.3979452885925436    -1.7192216664803903  0.005733845845878158
C    3.362163643014409    -2.694365164834639  -0.24484274153182137
C    4.6460523688931445    -2.3440664470958397  -0.6086692579058807
C    4.955294312738983    -1.0027505374123884  -0.7191672280909117
C    4.0404835194357105    -0.008533656047057721  -0.48264863774515454
O    4.278507737083588    1.340756056679458  -0.5735827354768532
C    5.448939325538495    2.0043760963604695  -0.9014069180985177
C    6.571384631981316    1.7070091411274624  0.060412348859605385
C    2.748601901677971    -0.39190320919379934  -0.11616578471961819
O    1.7613363013430023    0.5717983766243413  0.14061113978185472
H    -6.784818302912347    3.054028629984417  -1.177587086255804
H    -5.546612707317883    2.981106576785527  0.14935118892052024
H    -7.235420448687445    2.6917139957798732  0.5890051496248345
H    -7.203878515652723    0.4593411348145616  -0.24061867739794673
H    -6.062032389992824    0.9578860340497859  -1.5943941606126875
H    -4.248953516145324    1.2355161502200693  -0.14795370698963758
H    -5.327584241816008    0.8015737490824821  1.3083678055164396
H    -5.711654823428578    -1.3980230080522968  0.40075011721575277
H    -4.773077391252315    -1.0785676082193396  -1.0639539127489945
H    -3.755100507678116    -2.062042774437259  1.4975466605061887
H    -1.3312377202654841    -2.2069110420225564  1.8012501991371148
H    0.7470422223506362    -3.0971265263866923  0.49539548002989314
H    3.0642107824783267    -3.728318512038958  -0.14336534028044853
H    5.378363270068862    -3.1115213894781513  -0.7987604430064659
H    5.950156173187689    -0.7449726551054592  -1.0031316116672393
H    5.222328098077193    3.1080846481568773  -0.8043940639696501
H    5.806840582857794    1.8078396284984524  -1.9351745579309316
H    6.21140068671906    1.2934961778876564  1.026324447356718
H    7.064232458787285    2.6813209600804404  0.3381581122594744
H    7.346910932505775    1.0825383856344617  -0.43367759670604006
H    1.7942714527493095    1.5136869005900124  -0.20996397628098845
}

set basis cc-pVDZ
set dlpno_toggle_memory false
energy('dlpno-ccsd(t)')

This appears to be an issue with non-frozen core DLPNO-CCSD(T)! The segfault disappears if you run it with set freeze_core true I will look into it!

andyj10224 avatar Dec 13 '25 14:12 andyj10224

I'm trying to test out the implementation of DLPNO-CCSD(T) in the main branch (revision 6053f9a checked out today). I find that no matter how much memory I try to give it, I still get an error that there wasn't enough.

Here's my input file, which just computes the energy of a single water molecule.

memory 32 GB

molecule h2o {
  O 
  H 1 0.96
  H 1 0.96 2 104.5
}

set basis cc-pVDZ
energy('dlpno-ccsd(t)')

It fails with the message, "Fatal Error: Too little memory given for DLPNO-CCSD Algorithm!"

Looking through the output I find its estimates of the required memory.

  ==> DLPNO-CCSD Memory Requirements <== 

    *** Common Quantities ***
    (q | i j) [AUX, LMO]          :    0.000 [GB]
    (q | i a) [AUX, LMO, PAO]     :    0.000 [GB]
    (q | a b) [AUX, PAO]          :    0.000 [GB]
    (k_{ij}, l_{ij})-like         : 844425.371 [GB]
    (k_{ij}, c_{ij})-like         : 144321447940.430 [GB]
    (a_{ij}, b_{ij})-like         :   48.727 [GB]
    (a_{ij}, c_{kj})-like         : 844425.347 [GB]
    (i a_{ij} | b_{ij}, c_{ij})   :    0.001 [GB]
    (Q_{ij} | k_{ij} i)           :   48.727 [GB]
    (Q_{ij} | a_{ij} i)           :   48.727 [GB]
    (Q_{ij} | m_{ij} a_{ij})      :    0.001 [GB]
    (Q_{ij} | a_{ij} b_{ij})      :    0.003 [GB]

Some of those numbers look suspiciously large for a single water molecule. :)

I think I discovered the cause of this… when I am computing memory, I forgot to initialize my counters to zero before incrementing them

https://github.com/psi4/psi4/blob/master/psi4/src/psi4/dlpno/ccsd.cc#L269

andyj10224 avatar Dec 13 '25 15:12 andyj10224

Here is a bigger molecule with 41 atoms. This one segfaults.

memory 32 GB

molecule mol {
0 1
C    -6.490999517882142    2.537862684063124  -0.2323102727995172
C    -6.328573442396725    1.0587890760537342  -0.5217635401567717
C    -5.075936167154976    0.6061573086614495  0.2350438796872525
C    -4.8126633715875    -0.855500983294261  0.022371668031886085
N    -3.662814391362749    -1.3722394084528953  0.6919258810947014
C    -2.3317025039904817    -1.008380205827574  0.3368257968229602
S    -2.0928918146868747    0.05761004239136857  -0.9006178435072472
N    -1.184069382531426    -1.5242034114566212  1.0056586198187671
N    0.1301397890615767    -1.1444494802483816  0.6298804433873023
C    1.0334156775993977    -2.059394067441253  0.3947429578112244
C    2.3979452885925436    -1.7192216664803903  0.005733845845878158
C    3.362163643014409    -2.694365164834639  -0.24484274153182137
C    4.6460523688931445    -2.3440664470958397  -0.6086692579058807
C    4.955294312738983    -1.0027505374123884  -0.7191672280909117
C    4.0404835194357105    -0.008533656047057721  -0.48264863774515454
O    4.278507737083588    1.340756056679458  -0.5735827354768532
C    5.448939325538495    2.0043760963604695  -0.9014069180985177
C    6.571384631981316    1.7070091411274624  0.060412348859605385
C    2.748601901677971    -0.39190320919379934  -0.11616578471961819
O    1.7613363013430023    0.5717983766243413  0.14061113978185472
H    -6.784818302912347    3.054028629984417  -1.177587086255804
H    -5.546612707317883    2.981106576785527  0.14935118892052024
H    -7.235420448687445    2.6917139957798732  0.5890051496248345
H    -7.203878515652723    0.4593411348145616  -0.24061867739794673
H    -6.062032389992824    0.9578860340497859  -1.5943941606126875
H    -4.248953516145324    1.2355161502200693  -0.14795370698963758
H    -5.327584241816008    0.8015737490824821  1.3083678055164396
H    -5.711654823428578    -1.3980230080522968  0.40075011721575277
H    -4.773077391252315    -1.0785676082193396  -1.0639539127489945
H    -3.755100507678116    -2.062042774437259  1.4975466605061887
H    -1.3312377202654841    -2.2069110420225564  1.8012501991371148
H    0.7470422223506362    -3.0971265263866923  0.49539548002989314
H    3.0642107824783267    -3.728318512038958  -0.14336534028044853
H    5.378363270068862    -3.1115213894781513  -0.7987604430064659
H    5.950156173187689    -0.7449726551054592  -1.0031316116672393
H    5.222328098077193    3.1080846481568773  -0.8043940639696501
H    5.806840582857794    1.8078396284984524  -1.9351745579309316
H    6.21140068671906    1.2934961778876564  1.026324447356718
H    7.064232458787285    2.6813209600804404  0.3381581122594744
H    7.346910932505775    1.0825383856344617  -0.43367759670604006
H    1.7942714527493095    1.5136869005900124  -0.20996397628098845
}

set basis cc-pVDZ
set dlpno_toggle_memory false
energy('dlpno-ccsd(t)')

This appears to be an issue with non-frozen core DLPNO-CCSD(T)! The segfault disappears if you run it with set freeze_core true I will look into it!

I also figured out the reason for the segfault! I opened a pull request #3338. I will close this issue when that is merged!

andyj10224 avatar Dec 13 '25 15:12 andyj10224

Both problems are now fixed for me. Thank you!

I've run into another problem, but I'll open a separate issue for that.

peastman avatar Dec 14 '25 17:12 peastman