fireworks icon indicating copy to clipboard operation
fireworks copied to clipboard

task-level recovery might not work if a FireTask changes dirs

Open computron opened this issue 6 years ago • 2 comments

See:

https://github.com/hackingmaterials/atomate/issues/217

I already fixed it so that FW_offline.json is loaded from the correct place. But it looks like the checkpoint object always goes to the "launch_dir" (i.e., the initial launch dir) to run recovery tasks. However, if one of the tasks in a FireWork changed directories (e.g., task 3), then subsequent tasks (e.g., tasks 4-6) should be run in that changed directory. So if you do task-level rerun at task 4, it should be in that changed directory.

It seems to me that _prev_dir in the checkpointing dict should not be created by LaunchPad but rather using os.get_cwd() when the rocket does the checkpointing. Or, there should be both _prev_dir (i.e., launch_dir...) and _cur_dir (directory of the most recently executing task) in the checkpoint object. When doing task-level recovery, one should use the latter as the directory.

@montoyjh Any thoughts? A lot of this was your code so want to make sure it plays nicely with your knowledge

computron avatar Apr 05 '18 16:04 computron

Is it possible to get the absolute path of the FW_offline.json, rather than the relative path, and use that for the remainder of the firework?

EDIT: Nevermind, didn't understand the issue at first.

montoyjh avatar Apr 05 '18 16:04 montoyjh

Yeah, I think having os.getcwd() define the directory to be changed into on a task level recovery should be good (this assumes that the recovery firetask won't try to create a new subdirectory and copy on a relative path or anything). It's gonna get a bit messy if a user tries to recover on a firetask that makes a directory, cds, and does something though.

montoyjh avatar Apr 05 '18 16:04 montoyjh