pyiron_base
pyiron_base copied to clipboard
Default job name of restarted simulations
I'm currently working on Damask where restarting is part of job routines. In the current setting, the default job name is job_restart if the job name of the old job is job. This causes a bit of problems when the job has to be let's say run 10 times, which will be something like job_restart_restart_restart_..._restart. I'd like to change it but I also wanted to hear your opinions
- Possibility 0: Leave it like this
- Possibility 1: Replace it by
rerun_{index}(orrestart, butrerunis shorter)- This makes it difficult to bundle all the simulations, because the first run won't contain
rerun, so if we use a for loop it's gonna be likeall_jobs = [pr.load("job")] + [pr.load(f"job_rerun_{counter}") for counter in range(5)](which is still WAY easier than the current setting). - We could potentially check if
job_restartexists and make a warning for the backward compatibility
- This makes it difficult to bundle all the simulations, because the first run won't contain
- Possibility 2: Create a new entry in the job table for the counter
- It's probably a nightmare for the backward compatibility
- We wouldn't have a problem with user accidentally using
rerunorrestartin the job name (because in the option above we would have to parsererunorrestart)
In the job table we have the parent ID this should print to the job the current job was restarted from.
In the job table we have the parent ID this should print to the job the current job was restarted from.
That's true, but it's not particularly helpful when you want to have a list of connected jobs by list comprehension or other methods. Especially the user usually wants to use the main job and have all the subsequent jobs, and not other way around. In that case it's difficult to figure out which ones are the child jobs. Anyway, it doesn't solve my original problem that the job names become irrationally long.
* Possibility 1: Replace it by `rerun_{index}` (or `restart`, but `rerun` is shorter) * This makes it difficult to bundle all the simulations, because the first run won't contain `rerun`, so if we use a for loop it's gonna be like `all_jobs = [pr.load("job")] + [pr.load(f"job_rerun_{counter}") for counter in range(5)]` (which is still WAY easier than the current setting).
You could even add a convenience method on parent jobs like
@property
def all_re_run_names(self) -> list[str]:
all_names = [self.name]
for name in self.project.job_table()["job_name"]:
if name.startswith(self.name + "_rerun_"):
all_names.append(name)
Or similar -- there is probably a more elegant formulation and I'm not even sure the syntax here is correct, but you get the idea.
- Possibility 0: Leave it like this
I guess this way you will run into the column character length limit pretty fast
In the job table we have the parent ID this should print to the job the current job was restarted from.
That's true, but it's not particularly helpful when you want to have a list of connected jobs by list comprehension or other methods. Especially the user usually wants to use the main job and have all the subsequent jobs, and not other way around. In that case it's difficult to figure out which ones are the child jobs. Anyway, it doesn't solve my original problem that the job names become irrationally long.
I think that's probably only one SQL query (though I'd need to look it up), that we could bake into a method on a job, like job.get_all_children(). If this is easy, I think this would be my preferred solution as well. Then you could go for the second option for the naming.
Alternatively we could also make restart use job.child_project, so that they all live inside the folder of the original job. We had once discussed something similar for the repaired jobs from HandyMan.
Another point I had wanted to make is, that I think restart is doing way too many different things:
- restart aborted jobs to fix inputs
- restart dft jobs for different post processing runs, e.g. band structure calculations
- continue MD runs
So I would suggest that you add a new method to the Damask job that does only the things that make sense there and then we see if we can generalize it in base or else where (if necessary at all).
Current problem with backward compatibility:
job = pr.create.job.Sphinx("job")
...
job.run()
new_job = job.restart()
new_job.run()
@jan-janssen's solution: Child ID should be checked before the child runs.
@pmrv's solution: Change the behaviour in Damask:
class DamaskJob:
def rerun_loading(self, job_name=None):
if job_name is None:
if "restart" not in self.job_name:
job_name = "_".join(self.job_name.split("_")[:-1]) + "_" + int(self.job_name.split("_")[-1]) + 1
self:
job_name = self.job_name + "restart_0"
return self.restart(job_name=job_name)
Tasked with writing a method to return all jobs that originate from a given job based on the job_table ChatGPT gave this
import pandas as pd
# Sample DataFrame
data = {'id': [1, 2, 3, 4, 5, 6],
'parentid': [None, 1, 1, 2, 3, 4]}
df = pd.DataFrame(data)
def find_all_descendants(df, target_id):
all_related_ids = set()
def find_children(parent_ids):
children = df.query('parentid in @parent_ids')['id']
if children.empty:
return set()
return set(children).union(find_children(children))
all_related_ids = find_children({target_id})
return df.query('id in @all_related_ids')
# Usage
target_id = 1
result = find_all_descendants(df, target_id)
print(result)
Not sure if it is useful.
I guess the conclusion of my original question is that we don't change the behaviour of restart. Is that correct?