dpgen icon indicating copy to clipboard operation
dpgen copied to clipboard

[Feature Request] Speed up dpgen jobs by skipping FP long tail jobs

Open shazj99 opened this issue 4 years ago • 1 comments

Summary

In order to speed up dpgen jobs, it may be practicable to accelerate FP stage by just skipping slow jobs or waiting finished asynchronously.

Detailed Description

While running dpgen jobs, we found the duration time of FP phrase is a large part in each iteration. This is because the DFT computing for some candidates are really hard and time consuming, and we need to wait all those long tail to be finished before going to the next iteration. We found that the proportion of those candidates is very small, which may less than 1%.

So we may add new params "ratio_failure" and "async_check" to optimize the execution: if most of fp jobs have finished(indicated by "ratio_failure"), we can directly discard the remaining jobs and go to the next iteration. We think it is acceptable since the ratio is very small. Further more, we can also enable "async_check" to wait these jobs finished and download results asynchronously. In this way, we hope these fp data are not discarded and can be added back in next iteration, but the implementation codes will be a little complicated since both dpgen and dpdispatcher need to be modified.

Following is our test number, which can significantly saving time: image

Further Information, Files, and Links

The following is our demo codes: DPDispatcher: https://github.com/shazj99/dpdispatcher/commit/8d2849d669e7a597bae52547e11359be9b283c27 DPGen: https://github.com/shazj99/dpgen/commit/3cc144fb8adf30011d70d71fa976b65cb921f7d8

Welcome for discussions and comments!

shazj99 avatar Jan 04 '22 12:01 shazj99

@felix5572 @AnguseZhang Can you look at this FR? Looking forward for your opinions.

shazj99 avatar Jan 04 '22 12:01 shazj99