abacus-develop icon indicating copy to clipboard operation
abacus-develop copied to clipboard

Unnecessary overhead for solving eigenvalues in diago_dav_subspace

Open Cstandardlib opened this issue 2 months ago • 1 comments

Details

hegvd is called to solve all the eigenvalues where only some of them are required, i.e. hegvx should be called instead.

See https://github.com/deepmodeling/abacus-develop/blob/6f277a75a36c87b42ccb4bedf0d044bfb132767e/source/source_hsolver/diago_dav_subspace.cpp#L543


Full code:

 if (this->device == base_device::GpuDevice)
    {
#if defined(__CUDA) || defined(__ROCM)
        if (this->diag_comm.rank == 0)
        {
            syncmem_complex_op()(this->d_scc, scc, nbase * this->nbase_x);
            hegvd_op<T, Device>()(this->ctx, nbase, this->nbase_x, this->hcc, this->d_scc, this->d_eigenvalue, this->vcc);
            syncmem_var_d2h_op()((*eigenvalue_iter).data(), this->d_eigenvalue, this->nbase_x);
        }
#endif
    }
    else
    {
        if (this->diag_subspace == 0)
        {
            if (this->diag_comm.rank == 0)
            {
                std::vector<std::vector<T>> h_diag(nbase, std::vector<T>(nbase, *this->zero));
                std::vector<std::vector<T>> s_diag(nbase, std::vector<T>(nbase, *this->zero));

                for (size_t i = 0; i < nbase; i++)
                {
                    for (size_t j = 0; j < nbase; j++)
                    {
                        h_diag[i][j] = hcc[i * this->nbase_x + j];
                        s_diag[i][j] = scc[i * this->nbase_x + j];
                    }
                }
                hegvx_op<T, Device>()(this->ctx,
                                      nbase,
                                      this->nbase_x,
                                      this->hcc,
                                      this->scc,
                                      nband,
                                      (*eigenvalue_iter).data(),
                                      this->vcc);
                // reset:
                for (size_t i = 0; i < nbase; i++)
                {
                    for (size_t j = 0; j < nbase; j++)
                    {
                        hcc[i * this->nbase_x + j] = h_diag[i][j];
                        scc[i * this->nbase_x + j] = s_diag[i][j];
                    }

                    for (size_t j = nbase; j < this->nbase_x; j++)
                    {
                        hcc[i * this->nbase_x + j] = *this->zero;
                        hcc[j * this->nbase_x + i] = *this->zero;
                        scc[i * this->nbase_x + j] = *this->zero;
                        scc[j * this->nbase_x + i] = *this->zero;
                    }
                }
            }
        }
        else
        {
#ifdef __MPI
            std::vector<T> h_diag;
            std::vector<T> s_diag;
            std::vector<T> vcc_tmp;
            if (this->diag_comm.rank == 0)
            {
                h_diag.resize(nbase * nbase, *this->zero);
                s_diag.resize(nbase * nbase, *this->zero);
                vcc_tmp.resize(nbase * nbase, *this->zero);
                for (size_t i = 0; i < nbase; i++)
                {
                    for (size_t j = 0; j < nbase; j++)
                    {
                        h_diag[i * nbase + j] = hcc[i * this->nbase_x + j];
                        s_diag[i * nbase + j] = scc[i * this->nbase_x + j];
                    }
                }
            }
            diago_hs_para(h_diag.data(),
                          s_diag.data(),
                          nbase,
                          nband,
                          (*eigenvalue_iter).data(),
                          vcc_tmp.data(),
                          this->diag_comm.comm,
                          this->diag_subspace,
                          this->diago_subspace_bs);
            if (this->diag_comm.rank == 0)
            {
                for (size_t i = 0; i < nband; i++)
                {
                    for (size_t j = 0; j < nbase; j++)
                    {
                        vcc[i * this->nbase_x + j] = vcc_tmp[i * nbase + j];
                    }
                }
            }

Task list for Issue attackers (only for developers)

  • [ ] Reproduce the performance issue on a similar system or environment.
  • [ ] Identify the specific section of the code causing the performance issue.
  • [ ] Investigate the issue and determine the root cause.
  • [ ] Research best practices and potential solutions for the identified performance issue.
  • [ ] Implement the chosen solution to address the performance issue.
  • [ ] Test the implemented solution to ensure it improves performance without introducing new issues.
  • [ ] Optimize the solution if necessary, considering trade-offs between performance and other factors (e.g., code complexity, readability, maintainability).
  • [ ] Review and incorporate any relevant feedback from users or developers.
  • [ ] Merge the improved solution into the main codebase and notify the issue reporter.

Cstandardlib avatar Oct 17 '25 08:10 Cstandardlib

good catch

mohanchen avatar Oct 17 '25 13:10 mohanchen