Davidson Breakdown if eigenpairs converged in pre-loop cycle
Describe the bug
Description
Efforts are made to make eigensolver an independent module.
After conducting an investigation and testing, I've encountered an issue with the diag_once function where the orthogonalization process is not performed correctly under specific conditions. Here are the details of my findings (when running tests on small problems, like dimension=10):
diago_david.cpp:1004: void hsolver::DiagoDavid<T, Device>::SchmidtOrth(const int&, int, int, const T*, T*, int, int) [with T = std::complex<double>; Device = base_device::DEVICE_CPU]: Assertion `psi_norm > 0.0' failed.
In the diag_once main loop, if an exact solution is obtained during the first iteration before the main loop, the residual block becomes zero. As a result, the subsequent orthogonalization process does not proceed as expected. This could potentially lead to incorrect results or a halt in the iterative process.
This is how current davidson code proceeds:
// first cycle before the loop
// eigenpairs are updated once here before entering main loop
...
hpsi_func(this->hpsi, basis, nbase_x, dim, 0, nband - 1);
this->cal_elem(dim, nbase, nbase_x, this->notconv, this->hpsi, this->spsi, this->hcc, this->scc);
this->diag_zhegvx(nbase, nband, this->hcc, this->scc, nbase_x, this->eigenvalue, this->vcc);
for (int m = 0; m < nband; m++)
{
eigenvalue_in[m] = this->eigenvalue[m];
}
ModuleBase::timer::tick("DiagoDavid", "first");
// below starts the main loop
int dav_iter = 0;
do
{
dav_iter++;
this->cal_grad(hpsi_func,
spsi_func,
dim,
nbase,
nbase_x,
this->notconv,
this->hpsi,
this->spsi,
this->vcc,
unconv.data(),
this->eigenvalue);
this->cal_elem(dim, nbase, nbase_x, this->notconv, this->hpsi, this->spsi, this->hcc, this->scc);
this->diag_zhegvx(nbase, nband, this->hcc, this->scc, nbase_x, this->eigenvalue, this->vcc);
// check convergence and update eigenvalues
ModuleBase::timer::tick("DiagoDavid", "check_update");
this->notconv = 0;
for (int m = 0; m < nband; m++)
{
convflag[m] = (std::abs(this->eigenvalue[m] - eigenvalue_in[m]) < david_diag_thr);
if (!convflag[m])
{
unconv[this->notconv] = m;
this->notconv++;
}
eigenvalue_in[m] = this->eigenvalue[m];
}
ModuleBase::timer::tick("DiagoDavid", "check_update");
if (!this->notconv || (nbase + this->notconv > nbase_x)
|| (dav_iter == david_maxiter))
{
ModuleBase::timer::tick("DiagoDavid", "last");
// update eigenvectors of Hamiltonian
// psi_in = basis * vcc ...
if (!this->notconv || (dav_iter == david_maxiter))
{
// overall convergence or last iteration: exit the iteration
break;
} else {
this->refresh( ... );
}
} // end of if
} while (true);
Question
Should the convergence checking part be moved to head of main loop? Thus the eigensolver process will exit when convergence detected in the very first cycle.
Expected behavior
No response
To Reproduce
No response
Environment
No response
Additional Context
No response
Task list for Issue attackers (only for developers)
- [ ] Verify the issue is not a duplicate.
- [ ] Describe the bug.
- [ ] Steps to reproduce.
- [ ] Expected behavior.
- [ ] Error message.
- [ ] Environment details.
- [ ] Additional context.
- [ ] Assign a priority level (low, medium, high, urgent).
- [ ] Assign the issue to a team member.
- [ ] Label the issue with relevant tags.
- [ ] Identify possible related issues.
- [ ] Create a unit test or automated test to reproduce the bug (if applicable).
- [ ] Fix the bug.
- [ ] Test the fix.
- [ ] Update documentation (if necessary).
- [ ] Close the issue and inform the reporter (if applicable).
Current way of convergence check by eigenvalue difference of two successive iterations will NOT be able to capture it if convergence is reached in the first cycle.
this->notconv = 0;
for (int m = 0; m < nband; m++)
{
convflag[m] = (std::abs(this->eigenvalue[m] - eigenvalue_in[m]) < david_diag_thr);
if (!convflag[m])
{
unconv[this->notconv] = m;
this->notconv++;
}
eigenvalue_in[m] = this->eigenvalue[m];
}
This problem can only be solved by changing the way the convergence check is done, for example to check the norms of residuals.
@Cstandardlib do you have plan to solve this issue by yourself?
@WHUweiqingzhou This issue is still pending, since it will need a discussion on the convergence criteria, which may affect all the eigensolvers we have now.
We encountered a similar issue recently in diago_dav_subspace. After some investigation, we found that the failure of this assertion in our case comes from nan (not a number, usually a result of 0/0, inf-inf, or undefined behavior), instead of psi_norm = 0:
@Cstandardlib Davidson-like eigensolvers rarely give "exact" eigenvector that yield zero residual. I was wondering if your case really comes from a zero vector or nan. If it also comes from nan, chances are there are serious problems outside these eigensolvers, e.g. in hpsi.
Currently we're collecting test cases with this error message, and unfortunately the smallest system we have is Fe16, which takes hundreds of seconds to reproduce this error and is very inconvenient to debug. We would appreciate if you could provide smaller test cases with this error message.
@jinzx10 I'll try on small examples with matrices of around 10 by 10 to help locate the error.
I use a simple 3x3 identity matrix and produce the error as follows:
151: dim=3, nband=1, ld_psi=3
151: h_mat=
151: 1 0 0
151: 0 1 0
151: 0 0 1
151: enter SchmidtOrth:
151: psi_norm0: 0
151: psi_norm: 0.479402
151: complete SchmidtOrth.
151: End of the first iteration
151: Davidson iteration 1
151: psi_[0]:
151: 0.455425 0.847152 0.273717
151: 0th psi
151: enter SchmidtOrth:
151: psi_norm: 0
151: HSolver_dav_real: abacus-develop/source/module_hsolver/diago_david.cpp:1024: void hsolver::DiagoDavid<T, Device>::SchmidtOrth(const int&, int, int, const T*, T*, int, int) [with T = double; Device = base_device::DEVICE_CPU]: Assertion `psi_norm > 0.0' failed.
It breaks inside the first SchmidtOrth in the main loop.
And test by a 6x6 identity:
151: dim=6, nband=1, ld_psi=6
151: h_mat=
151: 1 0 0 0 0 0
151: 0 1 0 0 0 0
151: 0 0 1 0 0 0
151: 0 0 0 1 0 0
151: 0 0 0 0 1 0
151: 0 0 0 0 0 1
151: enter SchmidtOrth:
151: psi_norm: 1.25946
151: complete SchmidtOrth.
151: Eigenvalues of the first iteration
151: Eigenvalue 0 : 1
151: Eigenvectors of the first iteration
151: Eigenvector 0 :
151: 1 0 0 0 0 2.42092e-322
151: End of the first iteration
151: Davidson iteration 1
151: psi_[0]:
151: 0.28098 0.522659 0.168873 0.702761 -0.30914 0.172972
151: 0th psi
151: enter SchmidtOrth:
151: psi_norm: 1.42593e-32
151: DiagoDavid::SchmidtOrth:aborted for psi_norm <1.0e-12
151: nband = 2
151: m = 1
When seeking one more pair, it seems that the residual of theeigenvectors are already zero before entering main loop SchmidtOrth.
151: dim=6, nband=2, ld_psi=6
151: h_mat=
151: 1 0 0 0 0 0
151: 0 1 0 0 0 0
151: 0 0 1 0 0 0
151: 0 0 0 1 0 0
151: 0 0 0 0 1 0
151: 0 0 0 0 0 1
151: enter SchmidtOrth:
151: psi_norm: 1.25946
151: complete SchmidtOrth.
151: enter SchmidtOrth:
151: psi_norm: 1.50619
151: complete SchmidtOrth.
151: Eigenvalues of the first iteration
151: Eigenvalue 0 : 1
151: Eigenvalue 1 : 1
151: Eigenvectors of the first iteration
# here outputs vcc, not the real eigenvector of 1st iteration
151: Eigenvector 0 :
151: -0.874561 -0.484915 0 0 -0.484915 0.874561
151: Eigenvector 1 :
151: -0.484915 0.874561 0 0 0 0
151: End of the first iteration
151: Davidson iteration 1
151: basis[2:4]:
151: Residual 0 :
151: -1.11022e-16 -1.11022e-16 -8.32667e-17 -1.11022e-16 -1.38778e-17 -6.93889e-18
151: Residual 1 :
151: 0 0 0 0 0 0
151: psi_[0]:
151: 0.28098 0.522659 0.168873 0.702761 -0.30914 0.172972
151: 0th psi
151: enter SchmidtOrth:
151: psi_norm: 2.46394e-32
151: DiagoDavid::SchmidtOrth:aborted for psi_norm <1.0e-12
151: nband = 4
151: m = 2
@jinzx10 Will it be better to debug with small data file of H matrix? I'm considering adding new unittests for diagonalization methods now that they do not depend on Hamilt and Psi.
There are two potential options:
- Read from data text files of elements of Hermitian matrix H, like current files of
source/module_hsolver/test/H-KPoints-Si2.dat. - A second way is to write code to generate data that is specific to eigenvalue solver testing. We are now able to generate the elements we want by a loop, for example, and store them in a vector/array, then wrap the Matrix-Blockvector operations (i.e. hpsi_func) in a function as argument of solvers.
@Critsium-xy Here I provide some tiny toy matrices to demonstrate this phenomenon. You can use it to test dav&dav_sub