qmcpack dump walkers periodically for post-processing

Proposed changes

Repurpose unused input "record_configs" to set frequency of dumping walkers for post-processing.

Example input of a simple harmonic oscillator tb38_msho.zip

A normal checkpoint file {prefix}.config.h5 is overwritten at every checkpoint and contains

block                    Dataset {SCALAR}
number_of_walkers        Dataset {SCALAR}
walker_partition         Dataset {2}
walker_weights           Dataset {65}
walkers                  Dataset {65, 1, 3}

The walkers dataset contains the snapshot at checkpoint.

With "record_configs" > 0, each addition to the config.h5 is preserved. Instead of one walkers dataset, we need to identify the block at which the walkers are dumped

block                    Dataset {SCALAR}
number_of_walkers        Dataset {SCALAR}
walker_partition         Dataset {2}
walker_partition32       Dataset {2}
walker_partition64       Dataset {2}
walker_weights           Dataset {64}
walker_weights32         Dataset {63}
walker_weights64         Dataset {64}
walkers                  Dataset {64, 1, 3}
walkers32                Dataset {63, 1, 3}
walkers64                Dataset {64, 1, 3}

The number 32 in the name walkers32 identify that these walkers were dump at block 32.

What type(s) of changes does this code introduce?

New feature

Does this introduce a breaking change?

No

What systems has this change been tested on?

Intel workstation

Checklist

Update the following with a yes where the items apply. If you're unsure about any of them, don't hesitate to ask. This is simply a reminder of what we are going to look for before merging your code.

Yes. This PR is up to date with current the current state of 'develop'
No. Code added or changed in the PR has been clang-formatted
Yes/No. This PR adds tests to cover any new code, or to catch a bug that is being fixed
Yes/No. Documentation has been added (if appropriate)

Mar 13 '24 22:03 Paul-St-Young

Hi Paul. Thanks for adding/restoring this ghost feature. As we have discussed this functionality is useful for a variety of uses, and clearly the functionality could be evolved further.

For now, two questions:

How much testing have you done that the energies and weights are correct and that, when averaged, they match what is in the scalar files?
Does this work with vmc_batched?

Mar 15 '24 13:03 prckent

The asan failure was found caused by CI OS image. It has been fixed.

Mar 26 '24 19:03 ye-luo

I also noticed that the added change only affect legacy driving. For my curiosity, what is missing for you to adopt the batched driver?

Mar 26 '24 19:03 ye-luo

With regards to batched I want to remove WalkerConfiguration as an object and heavily refactor eliminate the "Walker-->Particle" objects. The Walker object and some of its data possessions which are stuffed into particle set is stuffed full of values that should be contiguous vectors in the crowds or even at rank scope but are instead here in an AOS with respect to walkers and mapped into a byte buffer. As it stands all this "dumping walkers" stuff is predicated on that. Pointer arithmetic is done etc. The less dependency there is is on this the better. I think we would be better off writing an "estimator" that wrote the per walker values for data out. This exposes legacy walker implementation details that we don't want to carry forward either, to the extent that some still exist that is because refactoring ParticleSet has been blocked for several years now.

If you want to dump walkers it should be a crowd or population level operation so it can be synchronized properly and done efficiently. If you want to use parallel hdf5 the appropriate level would be at MCPopulation. Ideally it would be done from a separate io thread while real work is being done by the main crowd threads.

Apr 04 '24 18:04 PDoakORNL

@prckent apologies for the delayed response. To answer your questions:

How much testing have you done that the energies and weights are correct and that, when averaged, they match what is in the scalar files?

I did not test the energies and weights. The same dump function used for a normal checkpoint is called, so I just assumed it would be correct.

Does this work with vmc_batched?

No, I did not add this feature to the batched drivers.

The current implementation breaks down in an MPI run. I think this is due to problems with parallel hdf5 file writing. @ye-luo I added some more details of this feature in the PR description. Does this clear up the meaning of identify_block? I will also add some comments in the source code.

@ye-luo @PDoakORNL I would love some help on getting this feature to work. It's not clear to me exactly what needs to be done to make parallel HDF5 writing correct.

Apr 11 '24 18:04 Paul-St-Young

Hi Paul - Is #5019 working for you? If so, we can close this PR.

Jun 28 '24 18:06 prckent

@prckent please keep this PR alive. it is needed for different uses like reply.

Jun 28 '24 18:06 ye-luo

@prckent yes, #5019 does what I need. It outputs a bit too much information, but it seems straight-forward to modify so I can go from there. I don't mind if you decide to close this PR.

@ye-luo would it be easier to add a "replay" flag to the new WalkerLog class? I think it just needs to output walker coordinates and weights at every step without all the other debugging contents.

Jun 30 '24 16:06 Paul-St-Young

WalkerLog has very different design logics and use cases. For example, it writes one file per MPI rank instead of using scalable I/O. Data are not organized in steps and thus requires extra work to recover a true reply.

Jun 30 '24 16:06 ye-luo

I think this is no longer needed and can be closed.

Nov 13 '25 20:11 jtkrogel

I think a more refined version can be implemented when properly implementing a reply feature or correlated sampling.

Nov 14 '25 00:11 ye-luo