pynta icon indicating copy to clipboard operation
pynta copied to clipboard

Restart option added in case of unexpected termination

Open sakim8048 opened this issue 9 months ago • 0 comments

This PR includes two major updates. Updated restart option will restart optimization or vibration calculations if previous runs are unexpectedly terminated. Also this PR will allow users to run Pynta efficiently on ALCF Polaris machine. Details are described below:


  1. Restart Option

Upon running restart(), it will retrieve Fireworks workflow information, including the workflow ID number, task ID number, task states, and launch directories where unexpectedly terminated calculations were running. Before rerunning Fireworks for the incomplete runs, all necessary files, such as optimization trajectory files or vib folders, will be copied and sent to the destination directory.

If task states are not completed (e.g., fizzled or lost runs), the optimization runs will restart from the last geometry of the optimization trajectory file in the previous launch directory. In the case of a vibration restart, empty vibration JSON files will be deleted from the vib folder before rerunning the vibration.

  1. Running Pynta on ALCF Polaris

With Raymundo's efforts, this PR allows Pynta to run on ALCF Polaris with a single queue allocation. Raymundo updated the way Pynta maps tasks on each node for ALCF machines. Each task runs on a different Fireworker, and each Fireworker is associated with a node. This is available for multilauncher. The optimal approach is to set num_jobs in Pynta input script to the number of nodes.

sakim8048 avatar Mar 27 '25 15:03 sakim8048