abacus-develop
abacus-develop copied to clipboard
Double peak memory cost in `cast_memory_op`
Describe the bug
In esolver_ks_pw.cpp:
this->kspw_psi = GlobalV::device_flag == "gpu"
|| GlobalV::precision_flag == "single"
? new psi::Psi<T, Device>(this->psi[0])
: reinterpret_cast<psi::Psi<T, Device>*>(this->psi);
the constructor of Psi used the function of cast_memory_op:
template <typename T_out, typename T_in>
struct cast_memory<T_out, T_in, container::DEVICE_CPU, container::DEVICE_GPU> {
void operator()(
T_out* arr_out,
const T_in* arr_in,
const size_t& size)
{
auto * arr = (T_in*) malloc(sizeof(T_in) * size);
cudaErrcheck(cudaMemcpy(arr, arr_in, sizeof(T_in) * size, cudaMemcpyDeviceToHost));
for (int ii = 0; ii < size; ii++) {
arr_out[ii] = static_cast<T_out>(arr[ii]);
}
free(arr);
}
};
the temporary memory of arr is same as Psi, which should be optimized as soon as possible.
Expected behavior
No response
To Reproduce
No response
Environment
No response
Additional Context
No response
Task list for Issue attackers (only for developers)
- [ ] Verify the issue is not a duplicate.
- [ ] Describe the bug.
- [ ] Steps to reproduce.
- [ ] Expected behavior.
- [ ] Error message.
- [ ] Environment details.
- [ ] Additional context.
- [ ] Assign a priority level (low, medium, high, urgent).
- [ ] Assign the issue to a team member.
- [ ] Label the issue with relevant tags.
- [ ] Identify possible related issues.
- [ ] Create a unit test or automated test to reproduce the bug (if applicable).
- [ ] Fix the bug.
- [ ] Test the fix.
- [ ] Update documentation (if necessary).
- [ ] Close the issue and inform the reporter (if applicable).