pyiron_base icon indicating copy to clipboard operation
pyiron_base copied to clipboard

`GenericJob.save` just keeps copying things

Open liamhuber opened this issue 2 years ago • 6 comments

Repeated calls to save silently create new database items while duplicating the job name, i.e. they are a sort of silent "save-as" in database space. The job name however, stays the same, so you get an error if you try to load by job name:

from pyiron_base import Project

pr = Project("tmp")
pr.remove_jobs(silently=True, recursive=True)

job = pr.create.job.TableJob("my_table")
job.save()

print(len(pr.job_table()))
# 1

job.save()
print(len(pr.job_table()))
# 2

try:
    loaded = pr.load("my_table")
except ValueError as e:
    print(e.args[0])
# job name 'my_table' in this project '...' is not unique...

Is this intentional behaviour? I find it very counter-intuitive to other programs, where "save" updates my save file for the current object but doesn't create a new save file (or database entry in this case).

To further complicate things, PythonFunctionContainerJob does behave in the way I would expect by loading instead of saving if it finds the job. I like this better, but the difference in behaviour is definitely bad -- whatever behaviour we decide on should be consistent.

liamhuber avatar Mar 04 '24 19:03 liamhuber

save() is intended as internal function which serialises the job object, consisting of two parts, the storage in HDF5 and the creation of a SQL database entry. The function was never intended to be called by the user directly. I would be open to renaming it to a private function as we also had other users misusing it before.

jan-janssen avatar Mar 05 '24 11:03 jan-janssen

Renaming it is reasonable to me, but is there a reason we want the duplicating behaviour even for internal use?

liamhuber avatar Mar 05 '24 14:03 liamhuber

I have run into this before, but never deeply looked into it. Moving the linked section from PythonFunctionContainerJob to GenericJob should fix the issue and would be a-ok with me.

pmrv avatar Mar 05 '24 20:03 pmrv

I haven't gotten to checking but one of you might know offhand -- is the status also stored in hdf? If so then I guess the snippet needs to update that bit of serialized data? Otherwise I agree, I think the new code should be promoted up to the parent class.

liamhuber avatar Mar 06 '24 04:03 liamhuber

It is stored, but not sure where, might be just job['status'] even.

pmrv avatar Mar 06 '24 19:03 pmrv

It is stored, but not sure where, might be just job['status'] even.

We probably want to update this in the stored version too then.

IMO it would also be good to give a

warnings.warn(f"{self.job_name} already exists and was not re-saved. Use `Project.remove_job` to remove the stored version and re-save it if you have really made changes you want to save")

(or similar)

liamhuber avatar Mar 07 '24 18:03 liamhuber