Fix occasional optimizer failure.
Sometimes no eigenvalue is found from the mapping (filter and sort), which leads the code to select the first eigenvalue returned from the solver, which often sends the optimizer parameters in a very undesirable direction.
The results of this situation look something like this qmca output:
LocalEnergy Variance ratio
opt1_j2_7_6 series 0 -74.870455 +/- 0.024794 5.933186 +/- 0.535541 0.0792
opt1_j2_7_6 series 1 -75.001842 +/- 0.037365 7.256994 +/- 2.932719 0.0968
opt1_j2_7_6 series 2 -74.929538 +/- 0.016134 4.463142 +/- 0.154065 0.0596
opt1_j2_7_6 series 3 -38087874.983748 +/- 13522761.916354 52176250689832632320.000000 +/- 20073852128999710720.000000 1369891355506.0911
opt1_j2_7_6 series 4 -35506733.280610 +/- 11037598.694781 51117489304034312192.000000 +/- 17345565406866282496.000000 1439656216753.3770
opt1_j2_7_6 series 5 -9073700.394155 +/- 3700257.273053 11376395351848120320.000000 +/- 5387829541237716992.000000 1253776833889.7781
The workaround is to detect the situation where no acceptable eigenvalue is found, and run a second filter which accepts eigenvalues above the current VMC energy ( E_0 = H(0,0) ). It cuts off at E_0 + 100, to mirror the lower cutoff.
The solution could be simpler, but the fix was chosen to keep the existing eigenvalue selection unchanged as much as possible.
Some notes on the details of eigenvalue selection:
Why 2.0? I don't know. It was introduced in 4b5b0a3f81ef25412eee1350ad579eef62a8f3f3 (2011). Prior to that change, the code searched for the eigenvalue closest to E_0 that falls between E_0 and E_0 - 1e10. After that change, the range was narrowed to E_0 and E_0 - 100, and the code searched for the eigenvalue closest to E_0 - 2.0. My guess is that a slightly lower eigenvalue might lead to faster convergence if there are a number of eigenvalues close to E_0 (??) but that really ought to be tested.
Why E_0 - 100 for the cutoff? I also don't know. My guess is it's to cut off obviously spurious eigenvalues which could arise from noise. Since this is a fixed value, it could cause problems for systems with very large total energies.
What type(s) of changes does this code introduce?
Delete the items that do not apply
- Bugfix
Does this introduce a breaking change?
- No
What systems has this change been tested on?
desktop
Checklist
Update the following with a yes where the items apply. If you're unsure about any of them, don't hesitate to ask. This is simply a reminder of what we are going to look for before merging your code.
- Yes. This PR is up to date with current the current state of 'develop'
- Yes. Code added or changed in the PR has been clang-formatted
- No. This PR adds tests to cover any new code, or to catch a bug that is being fixed
- No. Documentation has been added (if appropriate)
Test this please
Very interesting! Thanks Mark. Can you describe the situation that triggered this?
The context is testing for #4901. The cusp correction depends on a discrete parameter (locations where the second derivative of the correction matches the second derivative the basis function). I had set up some scripts to scan across that parameter, and this problem with the optimizer would show up in a few percent of the cases.
I had seen it before with some orbital rotation optimization, but hadn't tracked down the issue (It doesn't require orbital rotation to be involved - the problem could show up just optimizing J2 and J1, though it was more frequent when also optimizing J3)
It doesn't seem the case that E_0 = H(0,0) but E_0 = (H * Inv)[0,0] where inv is the inverse of overlap matrix.
For the overlap matrix, the first row and column are the identity matrix, so the inverse also ends up as the identity matrix in the first row and column, and (H * Inv)[0, 0] ends up being the same as H[0,0].
Test this please
Test this please