qmcpack Optimizer Compatibility with Batched Drivers and Parameter Filtration

Proposed changes

These changes make several optimizers: the adaptive three shift version of the linear method, accelerated descent, and hybrid combinations of them, usable with the batched drivers. This PR also adds the option to filter the parameters optimized by the LM based on the amount of noise in the parameter gradients. We have found that allowing the LM to optimize only a filtered subset of the parameters while leaving the rest to accelerated descent to be an effective choice in recent work ( https://arxiv.org/abs/2111.07221 ). This option is available for both the batched and legacy versions of the drivers.

Most changes in the code are in the optimizer engines or deal with interfacing them to the batched drivers. The latter was more challenging than I expected so I'm happy to clarify anything and welcome any improvements.

What type(s) of changes does this code introduce?

New feature

Does this introduce a breaking change?

No

What systems has this change been tested on?

Lawrencium computing cluster

Checklist

Yes. This PR is up to date with current the current state of 'develop'
Yes. Code added or changed in the PR has been clang-formatted
No. This PR adds tests to cover any new code, or to catch a bug that is being fixed
Yes Documentation has been added (if appropriate)

Jun 15 '22 01:06 LeonOtis

Thank you Leon! To accelerate the acceptance of the feature you developed, it will be helpful if you can think about disassembling the components of this PR into several self-organized testable PRs. For example, if changing A requires changing B and adding C, we may have a standalone C without need to be connected to the rest of the code. We may also adjust B or part of it if that doesn't affect the existing functionality. In these break-up PRs. we may think of adding documentation and tests.

You may park this PR aside and start fresh from develop and copy over changes from this PR. With this process, we will see growing merged changes in develop and also reducing not yet emerged changes in this PR.

Jun 15 '22 01:06 ye-luo

Hi Leon - Thanks very much for this. As Ye wrote it would be helpful if you can break this into smaller pieces.

I do have one small question - have you thought about what it would take to get this to work both with and without OpenMP threads and with and without MPI? These are basically a hard requirement for production so it would be helpful to know if you simply didn't get around to this or if there is a tricky problem that we would have to solve in future.

(Building without MPI is a requirement for convenient tools and workstation use, while use of threads is essential to limit memory usage e.g. for spline runs)

Jun 15 '22 01:06 prckent

Thanks, Paul and Ye, for the comments. I'll break up the changes I made and try to have a PR for a first component pretty soon. I haven't gotten around to making sure things work with all the OpenMP threads and/or MPI cases. My normal use case is to just use MPI without threads so the code changes were tested with that. The optimizers with the legacy drivers should work with or without OpenMP and with or without MPI, but I will have to see what changes might be needed with the batched drivers.

Jun 15 '22 03:06 LeonOtis

Could you describe more about what you found difficult in interfacing with the batched drivers? It would be helpful for us to know which areas need more documentation or description.

Jun 15 '22 15:06 markdewing

Could you describe more about what you found difficult in interfacing with the batched drivers? It would be helpful for us to know which areas need more documentation or description.

One problem I ran into was in getting the correlated sampling phase of the adaptive three shift LM to agree with the results from using the legacy drivers. There seemed to be several issues when I was testing. First, electron positions did not match between batched and legacy drivers during the correlated sampling, which could be fixed by preventing the walker initialization from being executed again in VMCBatched. After that, the local energies still didn't match between the two driver types, but I found that inserting a call to flex_evaluateLog in VMCBatched and a call to mw_evaluateLog in QMCCostFunctionBatched removed the discrepancy for a small all electron test. I also found that the correlated sampling did not give accurate local energies in tests with pseudopotentials without the change from h0_list to h_list.

The other issue I had was ensuring that different amounts of sampling could be performed for different methods in a hybrid optimization. Having QMCDriverInput and VMCDriverInput copies read the selected xml blocks and overwrite the sampling settings in QMCFixedSampleLinearOptimizeBatched was the way I addressed this.

My understanding of the batched driver code is still quite limited so I'm very happy to make corrections if those changes break other parts of qmcpack or if there are better solutions. I think the first part of this PR that I will try to break off and resubmit will be the changes for interfacing with the batched drivers.

Jun 15 '22 19:06 LeonOtis

I assume this is not needed any more.

Aug 25 '22 21:08 ye-luo