dpgen
dpgen copied to clipboard
[Feature Request] Reduce MD loop time
Summary
The current dpgen workflow is divided into three parts: vasp labeling, training, and MD exploration. An ideal implementation attemps to assign as large as possible part of (financial/temporal) cost into fp labeling, which part is the actual cost to produce data.
However, the cost in MD exploration is proprotional to the complexcity of the system, in some schemes attempting to handle multi-element system, this part may increase to ~50% in finance and more than 50% in time.
We propose to reduce the cost of MD exploratin through 1)elminatiing the repeated TF initializatin when runing individual MD trajectries requiring the same pb files and 2) parallell computing on GPU.
More than 4 folds of efficiency incresase in MD ecploration can be expected.
Detailed Description
We noticed that the current dpgen MD tasks are executed sequentially on the GPU. We propose feasible optimization ideas as a feature request:
-
In the MD tasks of dpgen, the system size is relatively small, so the GPU occupancy rate is very low. Multiple MPI processes can be placed on the same GPU at the same time to improve the GPU utilization.
-
One MPI task can execute multiple lammps scripts in sequence, avoiding the overhead of initializing TF multiple times. In some small systems, TF and program initialization may consume considerable execution time.
supp:
An ideal implementation attempts to assign as large as possible part of (financial/temporal) cost into fp labeling, which part is the actual cost to produce data.
However, the cost in MD exploration is proportional to the complexity of the system, in 【some schemes attempting to handle multi-element system】, this part may increase to ~50% in finance and more than 50% in time.
More than 4 folds of efficiency increase in MD exploration can be expected according to this propose, i.e. reducing the cost of MD exploration down to 10% for some 【】cases.
https://docs.deepmodeling.com/projects/dpdispatcher/en/latest/examples/shell.html This feature has already been supported by dpdispatcher
with parameter "para_deg" in Resources
class.