qmcpack icon indicating copy to clipboard operation
qmcpack copied to clipboard

Cusp Correction & nstates Issues in QMCPACK

Open NastaMauger opened this issue 4 months ago • 10 comments

I am trying to export CASSCF wavefunctions generated using QP2. Because the converter treats all MOs as the number of states, after the conversion is done, I manually modify the HDF5 file so that the MultiDet/nstate group corresponds to the number of MOs in my CAS space (which is 30 in this case).

As a result, my converted file changes from:

    <determinantset type="MolecularOrbital" name="LCAOBSet" source="ion0" transform="yes" cuspCorrection="yes" href="QP2_CAS_4_4.h5">
      <sposet basisset="LCAOBSet" name="spo-up" size="296" cuspInfo="../spo-up.cuspInfo.xml">
        <occupation mode="ground"/>
        <coefficient size="296" spindataset="0"/>
      </sposet>
      <sposet basisset="LCAOBSet" name="spo-dn" size="296" cuspInfo="../spo-dn.cuspInfo.xml">
        <occupation mode="ground"/>
        <coefficient size="296" spindataset="0"/>
      </sposet>
      <multideterminant optimize="no" spo_up="spo-up" spo_dn="spo-dn">
        <detlist size="36" type="DETS" nca="0" ncb="0" nea="28" neb="28" nstates="296" cutoff="1e-20" ext_level="0" href="QP2_CAS_4_4.h5"/>
      </multideterminant>
    </determinantset>

to:


<wavefunction name="psi0" target="e">
  <determinantset type="MolecularOrbital" name="LCAOBSet" source="ion0" transform="yes" cuspCorrection="yes" href="QP2_CAS_4_4.h5">
    <sposet basisset="LCAOBSet" name="spo-up" size="30" cuspInfo="spo-up.cuspInfo.xml">
      <occupation mode="ground"/>
      <coefficient size="296" spindataset="0"/>
    </sposet>
    <sposet basisset="LCAOBSet" name="spo-dn" size="30" cuspInfo="spo-dn.cuspInfo.xml">
      <occupation mode="ground"/>
      <coefficient size="296" spindataset="0"/>
    </sposet>
    <multideterminant optimize="no" spo_up="spo-up" spo_dn="spo-dn">
      <detlist size="36" type="DETS" nca="0" ncb="0" nea="28" neb="28" nstates="30" cutoff="1e-20" ext_level="0" href="QP2_CAS_4_4.h5"/>

Please note that the key change is in the nstates and size attribute. For reference, I have also included the QMCPACK output produced by the converter using GAMESS output, and it matches my modified XML file from QP2.

However, applying the cusp corrections leads to incorrect results: -379.338340 whereas the QP2 reference energy is -379.410416553535.

I also noticed that during the conversion it prints:

Done reading CIs!!
QMCGaussianParserBase::dump

which might explain why, although I set up the correct number of states, it still treats the wavefunction as a CI.

Is there a way I can tweak the converter so that QP2 CASSCF wavefunctions are correctly converted?

QP2_CAS_4_4_modified.h5.txt

QP2_CAS_4_4_unmodified.h5.txt

qmcpack_from_gamess.wfj.xml.txt

NastaMauger avatar Aug 12 '25 00:08 NastaMauger

Clearly before doing any further work on #5582 or tackling the various issues with use of multideterminant wavefunctions we need to first add one or more proper meaningful+sensitive tests for each of the GAMESS, QP, PySCF routes. This should be a small enough system that we can run it quickly in continuous integration. The wavefunction must have strong contributions from many determinants, and the core electrons should be removed with ECPs. The latter avoids issues with cusp correction. The quantum chemistry codes should each get exactly the same result, as should a QMCPACK VMC run using the multideterminant wavefunction, no Jastrow. I don't see any point in changing the converters until this is all sorted out since we need to clearly communicate the situation to all users and avoid accidentally making the situation worse. The system does not need to be at the ground state geometry. @NastaMauger Do you have anything suitable? e.g. ECP water?

prckent avatar Aug 13 '25 14:08 prckent

I guess I was a little too excited to add the degenerate features to QMCPACK. I can write and test WATER + ECP with CASSCF to ensure that GAMESS, QP2, and PySCF all work with the current convert4qmc (no Jastrow, only the VMC step).

I’m unsure how one would handle such ECP setups with QP2. I need to check.

NastaMauger avatar Aug 19 '25 01:08 NastaMauger

The request is initially just for inputs that generate the same energy with multiple different codes. Based on #5589 we'll also eventually need a spin polarized version of a small system.

prckent avatar Aug 19 '25 14:08 prckent

What about benzene? I’m not sure if it would help https://github.com/QMCPACK/qmcpack/issues/5589, but in terms of degeneracy and system size it seems like an ideal choice. The molecular shape also makes it convenient for testing and implementing new features (such as symmetry), since it helps avoid complications regardless of the user’s background.

My current challenge is generating the correct ECP input for benzene, as I’ve never run an ECP calculation before. I tried using the Pseudopotential Library and so far my input, using ccECP.cc-pvDZ.games looks as follow:

 $CONTRL SCFTYP=RHF     RUNTYP=ENERGY  MULT=1 
         ICHARG=0       COORD=ZMTMPC   ISPHER=1 
         EXETYP=RUN     UNITS=ANGS     ECP=READ 
         $END
 $SYSTEM MWORDS=40 $END
 $SCF    DIRSCF=.TRUE. $END
 $GUESS  GUESS=HUCKEL $END
 $BASIS  GBASIS=CCD  $END 
 $DATA
BENZENE
DnH 2
   
C 0.0000000 0 0.0000000 0 0.0000000 0 0 0 0 
s 9 1.00
1 13.073594    0.0051583
2 6.541187     0.0603424
3 4.573411    -0.1978471
4 1.637494    -0.0810340
5 0.819297     0.2321726
6 0.409924     0.2914643
7 0.231300     0.4336405
8 0.102619     0.2131940
9 0.051344     0.0049848
s 1 1.00
1 0.127852     1.000000
p 9 1.00
1 9.934169     0.0209076
2 3.886955     0.0572698
3 1.871016     0.1122682
4 0.935757     0.2130082
5 0.468003     0.2835815
6 0.239473     0.3011207
7 0.117063     0.2016934
8 0.058547     0.0453575
9 0.029281     0.0029775
p 1 1.00
1 0.149161     1.000000
d 1 1.00
1 0.561160     1.000000

H 1.0942075 1 0.0000000 0 0.0000000 0 1 0 0 
S  8
  1      23.843185  0.00411490
  2      10.212443  0.01046440
  3       4.374164  0.02801110
  4       1.873529  0.07588620
  5       0.802465  0.18210620
  6       0.343709  0.34852140
  7       0.147217  0.37823130
  8       0.063055  0.11642410
S  1
  1       0.139013  1.00000000
P  1
  1       0.740212  1.00000000

C 1.4079550 1 120.00000 1 0.0000000 0 1 2 0 
H 1.0942075 1 120.00000 1 0.0000000 1 3 1 2 
C 1.4079550 1 120.00000 1 180.00000 1 3 1 2 
H 1.0942075 1 120.00000 1 180.00000 1 5 3 1 
C 1.4079550 1 120.00000 1 0.0000000 1 5 3 1 
H 1.0942075 1 120.00000 1 180.00000 1 7 5 3 
C 1.4079550 1 120.00000 1 0.0000000 1 7 5 3 
H 1.0942075 1 120.00000 1 180.00000 1 9 7 5 
C 1.4079550 1 120.00000 1 180.00000 1 1 2 3 
H 1.0942075 1 120.00000 1 0.0000000 1 11 1 2 
 $END

NastaMauger avatar Aug 20 '25 02:08 NastaMauger

By the way, at the moment, CASSCF calculations do not work with QP2 and PySCF because there is no proper way to define the CAS space with the converter. The converter automatically includes all molecular orbitals—since it is mainly intended for sCI calculations—making it impossible to restrict the calculation correctly.

Even if you manually edit the generated XML files, issues remain. For example, with a CAS(4,4) using orbitals 27–30, you would set nstates = 30 and nea = neb = 28. However, the coefficients also need to be written with the correct number of digits, which is currently not the case.

One possible solution would be to add a -cas flag to convert4qmc, which specifies the number of electrons and orbitals. For example: convert4qmc -orbitals cas_4_4.h5 -multidet cas_4_4_multidet.h5 -addCusp -production -cas 4,4 This would require adjusting the PySCF and QP2 converters so that the size, nstates, nea, and neb entries are correctly registered and the ci vector are save with the correct number of digits with respect to the CAS space definition

Doing this should resolve this open issue, and, unless there is a problem with the registered MOs, it should also address this related issue: https://github.com/QMCPACK/qmcpack/issues/5563

NastaMauger avatar Aug 20 '25 02:08 NastaMauger

Benzene is likely too big to run quickly to a useful accuracy; we have to be able to detect errors in handling the CI coefficients which presumably will need a small error bar. <=10 electrons is preferred. Lets start with water.

We don't have many people running molecules on the core funded QMCPACK team, and no one is focused on multideterminant calculations. Hence the need for a systematic approach to make sure this problem is definitively addressed and we are protected from (say) a future change in any of the quantum chemistry codes or some python update breaking the converters. First step is to get the same energy from different quantum chemistry codes for a small test system. ECPs need to be used to avoid the perturbation caused by the cusp corrections. Once we have seen that we can drive chemistry codes consistently, we'll look to see what changes in converters are needed to get the correct result using wavefunctions from each of the quantum chemistry codes in QMCPACK. We can handle the testing and automation once there are inputs for us to run. We'll likely need help decoding handling of CI coefficients for some of the codes since they all appear to be doing their own thing.

prckent avatar Aug 20 '25 13:08 prckent

What do you mean by “run quickly to a useful accuracy”? Benzene with cc-pVDZ, RHF, and CAS(4,4) takes less than 10 seconds to run on my personal laptop with each software. With this CAS space, you already get degenerate determinants and energies that are consistent across all programs

NastaMauger avatar Aug 20 '25 13:08 NastaMauger

We need to be able to run the QMC quickly to a useful accuracy. The QMC is the bottleneck.

prckent avatar Aug 20 '25 13:08 prckent

The issue with the water example already in QMCPACK is that your CAS space included all electrons (NMCC=0), so it’s essentially equivalent to a CI calculation. Because of that, such an example won’t be able to trigger the bug that arises when the CAS space does not include all the MOs but rather a user define one. That’s why it worked for many years—since, as you said, no one really runs such calculations.

However, in practice, one would usually want to perform a CAS on a well-defined active space and then use it for a subsequent QMC job, rather than running a huge sCI.

So my question is more about what you actually want to do. If you just want to test with water as it was done previously, then it works. But it might be worth adding to the documentation that CASSCF won’t work with PySCF or QP2, since both converters are designed for sCI. In fact the documentaiton for PySCF is missleading

NastaMauger avatar Aug 20 '25 13:08 NastaMauger

We need one or more inputs that trigger this bug. i.e. a reproducer. You pick. Needs to fit the specifications described above (small, quick to run in QMC, ECP, significantly non-zero excited state CI coefficients).

If there are problems that end up not being solved, yes, we'll update the docs. The converters should also be changed to abort if they can detect these situations.

prckent avatar Aug 20 '25 14:08 prckent