abacus-develop
abacus-develop copied to clipboard
Unnecessary overhead for solving eigenvalues in diago_dav_subspace
Details
hegvd is called to solve all the eigenvalues where only some of them are required, i.e. hegvx should be called instead.
See https://github.com/deepmodeling/abacus-develop/blob/6f277a75a36c87b42ccb4bedf0d044bfb132767e/source/source_hsolver/diago_dav_subspace.cpp#L543
Full code:
if (this->device == base_device::GpuDevice)
{
#if defined(__CUDA) || defined(__ROCM)
if (this->diag_comm.rank == 0)
{
syncmem_complex_op()(this->d_scc, scc, nbase * this->nbase_x);
hegvd_op<T, Device>()(this->ctx, nbase, this->nbase_x, this->hcc, this->d_scc, this->d_eigenvalue, this->vcc);
syncmem_var_d2h_op()((*eigenvalue_iter).data(), this->d_eigenvalue, this->nbase_x);
}
#endif
}
else
{
if (this->diag_subspace == 0)
{
if (this->diag_comm.rank == 0)
{
std::vector<std::vector<T>> h_diag(nbase, std::vector<T>(nbase, *this->zero));
std::vector<std::vector<T>> s_diag(nbase, std::vector<T>(nbase, *this->zero));
for (size_t i = 0; i < nbase; i++)
{
for (size_t j = 0; j < nbase; j++)
{
h_diag[i][j] = hcc[i * this->nbase_x + j];
s_diag[i][j] = scc[i * this->nbase_x + j];
}
}
hegvx_op<T, Device>()(this->ctx,
nbase,
this->nbase_x,
this->hcc,
this->scc,
nband,
(*eigenvalue_iter).data(),
this->vcc);
// reset:
for (size_t i = 0; i < nbase; i++)
{
for (size_t j = 0; j < nbase; j++)
{
hcc[i * this->nbase_x + j] = h_diag[i][j];
scc[i * this->nbase_x + j] = s_diag[i][j];
}
for (size_t j = nbase; j < this->nbase_x; j++)
{
hcc[i * this->nbase_x + j] = *this->zero;
hcc[j * this->nbase_x + i] = *this->zero;
scc[i * this->nbase_x + j] = *this->zero;
scc[j * this->nbase_x + i] = *this->zero;
}
}
}
}
else
{
#ifdef __MPI
std::vector<T> h_diag;
std::vector<T> s_diag;
std::vector<T> vcc_tmp;
if (this->diag_comm.rank == 0)
{
h_diag.resize(nbase * nbase, *this->zero);
s_diag.resize(nbase * nbase, *this->zero);
vcc_tmp.resize(nbase * nbase, *this->zero);
for (size_t i = 0; i < nbase; i++)
{
for (size_t j = 0; j < nbase; j++)
{
h_diag[i * nbase + j] = hcc[i * this->nbase_x + j];
s_diag[i * nbase + j] = scc[i * this->nbase_x + j];
}
}
}
diago_hs_para(h_diag.data(),
s_diag.data(),
nbase,
nband,
(*eigenvalue_iter).data(),
vcc_tmp.data(),
this->diag_comm.comm,
this->diag_subspace,
this->diago_subspace_bs);
if (this->diag_comm.rank == 0)
{
for (size_t i = 0; i < nband; i++)
{
for (size_t j = 0; j < nbase; j++)
{
vcc[i * this->nbase_x + j] = vcc_tmp[i * nbase + j];
}
}
}
Task list for Issue attackers (only for developers)
- [ ] Reproduce the performance issue on a similar system or environment.
- [ ] Identify the specific section of the code causing the performance issue.
- [ ] Investigate the issue and determine the root cause.
- [ ] Research best practices and potential solutions for the identified performance issue.
- [ ] Implement the chosen solution to address the performance issue.
- [ ] Test the implemented solution to ensure it improves performance without introducing new issues.
- [ ] Optimize the solution if necessary, considering trade-offs between performance and other factors (e.g., code complexity, readability, maintainability).
- [ ] Review and incorporate any relevant feedback from users or developers.
- [ ] Merge the improved solution into the main codebase and notify the issue reporter.
good catch