qmcpack icon indicating copy to clipboard operation
qmcpack copied to clipboard

Problem export CASSCF wavefunctions produce by PySCF

Open NastaMauger opened this issue 5 months ago • 9 comments

Describe the bug Impossible to export CASSCF wavefunctions produce by PySCF

I am trying to export a CAS(2,2) wavefunction using PySCF. However, when I run the following command: convert4qmc -orbitals CAS_2_2.h5 -production -addCusp -multidet CAS_2_2.h5

the code fails with the following error:

QMCPACK ERROR HDF5 read failure in hdf_archive::read NbDet  
Fatal Error. Aborting at Unhandled Exception

Inspecting the HDF5 file, it appears that the call to:

from PyscfToQmcpack import savetoqmcpack  
savetoqmcpack(mol, casscf, title=title)

(using my casscf object) does not register the multiple configurations that should be present in the wavefunction.

Is this expected behavior? Looking to this PR, it should not be an issue. Also, using the same setup but omitting the -multidet keyword in the convert4qmc command works without any issues.

I am using:

  • PySCF: 2.9.0 with Python 3.11.11

  • QMCPACK version 4.1.9 built on Jun 10 2025

    Git branch: develop Last git commit: deb1df30348488ad05509040f555ba25cb4d0dcc-dirty Last git commit date: Tue May 27 20:22:31 2025 -0400 Last git commit subject: Merge pull request #5506 from kayahans/bug/multidet

  • My H5 file: CAS_2_2.h5.txt

  • My PySCF script: python.py.txt

  • I also want to add that I had no problem to export multideterminant wavefunction generated with QP2.

NastaMauger avatar Jun 30 '25 21:06 NastaMauger

Thanks for the report and also noting that single determinant calculations are OK, as are QP2 calculations.

prckent avatar Jul 01 '25 13:07 prckent

There is still an issue with the CASSCF wavefunctions generated by PySCF. As shown in my PR, I haven’t made any modifications to how the HDF5 file is written, so any problem related to the file itself is not caused by my changes.

In my case, I need to use specific MO in my CAS space but in each case I check my MO and they are consistent. So, here’s the issue I’m currently encountering:

I am just at the cusp corrections step and I am using different wavefunctions, both with and without symmetry enabled in PySCF. I ran qmca checks to ensure the wavefunctions were not corrupted during the process before going further with QMCPack.

                        Symmetry On              Symmetry Off             PySCF Reference
HF                      -379.338 ± 0.01948       -379.341854 ± 0.019030   -379.354808
CAS(2,2)             [missing values]             281.02 ± 0.15400        -379.36063915
rotate MO + CAS(2,2)    280.477 ± 0.16722         279.55 ± 0.19542        -379.3683463

As you can see, the discrepancy is not related to symmetry (and therefore not to my PR), but appears to stem from the CASSCF wavefunction itself. The way the wavefunction is saved—and subsequently read by QMCPack—seems to result in incorrect energies.

Upon inspection, the HDF5 file contains coefficients that are consistent with the PySCF output. However, I am unable to verify the configuration section, as it is stored in a bitfield format.

For reference, I’m using the latest version of QMCPack (updated a few hours ago), and all tests were run with the same number of blocks, steps, walkers, substeps, timestep, and warmup steps.

For reference my python script is: rotate.txt

NastaMauger avatar Jul 02 '25 03:07 NastaMauger

@anbenali are pyscf multidet supported by qmcpack?

jtkrogel avatar Jul 02 '25 13:07 jtkrogel

I have never tried them and can absolutely not vouch for them! I know that @amandadumi worked on them though.

anbenali avatar Jul 02 '25 22:07 anbenali

The issue isn't with the CI coefficients themselves, but rather with the MOs.

The initial PR was tested on LiH, and the test appears to pass because the wavefunction is already well captured by the mean-field MOs. However, as soon as the system becomes more complex, the test fails — even though the CI coefficients are correctly exported.

In fact, I find it a bit odd to use a CASCI test on such a minimal system. It would have been more meaningful to test on something like water— still small, but more representative — while preserving the significance of the test.

To isolate the issue, I ran the same CASCI calculation but forced it to use Hartree–Fock orbitals. In that case, QMCPACK works as expected, even for larger systems. The failure only occurs when post-HF (CASSCF/CASSCI) orbitals are used.

So I believe the core problem lies not in the CI coefficients, but in how the post-SCF MOs are registered or exported in the HDF5 file. I haven’t fully tracked it down yet, but that’s where the inconsistency seems to originate.

EDIT: After testing with GAMESS, the issue is, as I suspected, related to the MOs. For some reason, the MOs exported for post-HF wavefunctions are not consistent with those produced by RHF: differences in the number of centers

NastaMauger avatar Jul 03 '25 00:07 NastaMauger

@NastaMauger thanks for pursuing this. This feature was only just added and clearly has had limited use. The initial test was just enough to check the mechanics of this feature, but unfortunately not enough to catch this error. We can work to add a more effective test. e.g. A water monomer or any small system where the non-ground state determinants have significant occupancy. It is very helpful to hear that the QP2 route is OK -- this tells us that the issue must lie somewhere in the plumbing of the conversion and output process and not inside QMCPACK.

prckent avatar Jul 08 '25 01:07 prckent

@prckent

The issue is 100% related to the MOs. If someone wants to solve this problem, here is what should be done:

  • Transform hcore and ERI to the MO basis.
  • Perform a Cholesky decomposition on the ERI tensor.
  • Ensure that the number of core orbitals (ncore) is correctly taken into account.

To achieve this, there are two possible solutions:

  1. Have a consistent way to register and save the data files. This can be done using TREXIO. This works with many softwares (not only GAMESS and PySCF).
  2. Alternatively, note that AFQMC example already performs CASSCF wavefunction saving. However, one might want to run CASSCF calculations without performing AFQMC. Therefore, another approach would be to merge this script with the PyscfToQmcpackconverter. This would also allow some code cleanup alongside a streamlined workflow.

Please note that I have contributed several PRs to TREXIO, including one that correctly registers MCSCF wavefunctions in the H5 format, so I am not completely neutral on this topic.

NastaMauger avatar Jul 08 '25 18:07 NastaMauger

  1. Q. Do you know if TREXIO would support converting the output of PySCF multireference formats to the format used by QP? Since the QP route works per your testing, this could be the beginnings of a solution.
  2. Why do you suggest that the MOs need some kind of transformation? It would be unfortunate if PySCF did not make the correct MOs available without transformation, but perhaps that is the case.

We'll add this bug to the changelog as a known issue.

prckent avatar Jul 08 '25 20:07 prckent

@prckent

  1. This is done by the same developer/team, so absolutely yes! And honestly, I think this is the best solution in the long run for seamless usage across more software. My current MCSCF-PR is awaiting PySCF approval, but otherwise it should work. However, it means user need to have trexio installed.
  2. This is something I’ve noticed as well. Note that the AFQMC examples also perform this transformation.

NastaMauger avatar Jul 08 '25 20:07 NastaMauger