pyiron_base
pyiron_base copied to clipboard
Inconsistent Behavior if Space in Job Name
While using pyiron to track LAMMPS jobs, I had by accident submitted some jobs with a space character in their name, e.g.
job_1 window1
. Without throwing an error, pyiron accepted this, saving the jobs with the space character in the filename for the h5
database file, but internally (as seen in project.job_table()
) it replaced the space with a underscore character.
This resulted in h5 database is empty
errors when attempting to load the job again, as the load code searched under the name with the underscore, while all files had the space character in them. Maybe it would be a good idea to warn the user if there is a space character in the job name? Or maybe consistently convert the space into a underscore?
Hi @pstaerk, thank you for reporting this issue. I hope it is already fixed with https://github.com/pyiron/pyiron_base/pull/1030 which should be included in the next pyiron version. Still to be sure, can you share a quick example code to reproduce your error?
@samwaseda As we now had multiple people who were surprised about pyiron changing the job name silently, I am wondering if the old way which just raised an error for incompatible job names was more user friendly. Or alternatively we should at least have a warning when the job name is changed to make it less confusing.
I added a warning in https://github.com/pyiron/pyiron_base/pull/1041
Ok, thanks! Admittedly, I did not check with the current develop version, rather with the default that gets installed when you use the conda installation procedure shown in the docs.
From what I understand the simplest way to re-create this error is to submit a dummy job of any kind, while giving it a name such as asdf_1 asdf
.
The error then occurs if one wants to re-load the job using something like pr.load('asdf_1 asdf')
, as pyiron looks for the asdf_1_asdf.h5
, while only asdf_1 asdf.h5
exists.
I think I would personally prefer that this throws an error, or alternatively that the load function also transforms all spaces into dashes, which also would make sense to me.
I don't think the problem is the fact that pyiron internally changes the name, but the real problem is that the job cannot be correctly loaded, meaning it has to correctly raise an error when there's a space in the job name (and not when the job name is changed)
@samwaseda With the latest development version the following code works:
from pyiron_atomistics import Project
pr = Project("demonstration")
job = pr.create.job.Lammps(job_name="calculation 1")
job.structure = pr.create.structure.ase.bulk("Cu")
job.run()
job = pr.load("calculation 1")
print(job.project_hdf5.file_name)
> '/Users/janssen/pyiron/projects/2023/2023-02-21-jobname-debug/demonstration/calculation_1.h5'
@pstaerk For me this also works with pyiron_base 0.5.31
which is the latest version of pyiron_base
on conda which is compatible to pyiron_atomistics
. Which version do you use?
@samwaseda With the latest development version the following code works:
Then it's even better 😎
I said it also in the PR, but in my opinion what's important is that the job name definition and loading can be done consistently, and not so much whether the name is changed or not.
I just checked pyiron_base.__version__
which gives me 0.5.32
.
I don't really have a good MWE, because I used a modified Lammps executable. But if it seems to work with your newest version then everything seems to be alright?
I don't really have a good MWE, because I used a modified Lammps executable. But if it seems to work with your newest version then everything seems to be alright?
I am a bit surprised - as it worked already with 0.5.31
it might be that I am looking at a different error. If you could share your example that failed, that would be very helpful.
My current code is about 500 lines long or so, but I can try to copy/paste the relevant sections here:
job_name = 'wl_window_type 10_other_nr1_f_9'
...
project.create_job("LammpsWL", job_name)
...
job.run()
And then, when I attempted to load:
pr = Project("...")
...
jobname = f"wl_window_type 10_other_nr1_f_9"
job = pr[jobname]
qs_dict[i].append(job['output/gcmc/Q'])
I got the aforementioned error.
Sorry if I can't be more helpful. Sharing the entire huge block of code + the modified executable seems unfeasible.
@pstaerk The difference is job = pr["calculation 1"]
does not work while job = pr.load("calculation 1")
does. @samwaseda That is the reason why I would like to have the warning to at least inform the users that something changed.
@pstaerk The difference is
job = pr["calculation 1"]
does not work whilejob = pr.load("calculation 1")
does. @samwaseda That is the reason why I would like to have the warning to at least inform the users that something changed.
Those two option really ought to give the same result though.
I also support warnings. They are easy to filter in high throughput settings, so that doesn't hurt imo.
@pstaerk The difference is
job = pr["calculation 1"]
does not work whilejob = pr.load("calculation 1")
does. @samwaseda That is the reason why I would like to have the warning to at least inform the users that something changed.
Ah no, in this case __getitem__
must do the conversion properly. Issuing a warning is unrelated here.
I still oppose warnings. They should be issued only when the user needs to know something, but in the problems raised so far, the confusion comes from the fact that pyiron didn't do the conversion consistently. I would support state.logger.info
or state.logger.debug
at most.
[INFO]
is ok with me.
Ah no, in this case
__getitem__
must do the conversion properly. Issuing a warning is unrelated here.
The issue is now fixed in https://github.com/pyiron/pyiron_base/pull/1042
I still oppose warnings. They should be issued only when the user needs to know something, but in the problems raised so far, the confusion comes from the fact that pyiron didn't do the conversion consistently. I would support
state.logger.info
orstate.logger.debug
at most.
As it is an impact change and I do not know if it works with the GenericMaster
and so on I would prefer to have a warning to make it visual that pyiron is implicitly changing the users input. It is now the second time that we have an issue with this change and it is very hard for new pyiron users to identify what is causing the error. This would not be the case with the warning message.
As it is an impact change and I do not know if it works with the GenericMaster and so on I would prefer to have a warning to make it visual that pyiron is implicitly changing the users input.
Agreed. Explicit is better than implicit. Here the implicit is handy, but we should at least provide the user a log of what shenanigans we're getting up to.
As it is an impact change and I do not know if it works with the
GenericMaster
and so on I would prefer to have a warning to make it visual that pyiron is implicitly changing the users input. It is now the second time that we have an issue with this change and it is very hard for new pyiron users to identify what is causing the error. This would not be the case with the warning message.
That means we'll issue something like "Warning: We changed your job name and you should be worried because we have no idea what that means", correct? That makes it look extremely unprofessional. In addition, we offer the conversion as a feature, so that people can directly insert e.g. a numpy array without worrying about converting it to a string. Warnings should be issued when the user is doing something in principle wrong, not for doing something we encourage.
So the issue I see is that someone comes up with two different job names that map to the same "corrected" job name in our current scheme. They are not aware of the conversion because the API hides it. When they submit the second job, in the best case it'll be of the same job type and just appear to reload the first job (which ought to confuse them a lot), in the worst case the job type is different and they are greeted with mysterious stack trace deep within pyiron.
So that's the failure case I'd like to exclude. If a warning on every job conversion is too much, then maybe we can have a check whether a job with the same "corrected" name already exists? I personally don't like that second option because it would incur a filesystem or database lookup on every job creation and that would hurt in high throughput settings, but I'm willing to compromise on this.
So that's the failure case I'd like to exclude. If a warning on every job conversion is too much, then maybe we can have a check whether a job with the same "corrected" name already exists?
I guess the issue with this is that if we execute the same Jupyter notebook again and expect the job to be reloaded it would give an error message while for any other job name it would work just fine.
In my preferred solution for that case, the job name is corrected, the warning is emitted and then the job is created with the new name. On the second run, it would again correct the name, issue the warning again and then load the job with that name. That should work no? We can even configure the logger so that this "name change warning" is only emitted once per process.
Since we offer the possibility of combining variables to make a job name, it's not compatible with a warning, i.e. something like this should not issue a warning:
job = pr.create.job.MyJob((a, b, c, d))
One compromise I can see is to issue a warning if str
is set instead of tuple
or list
and a conversion takes place, i.e.:
job = pr.create.job.MyJob('dots.are.replaced') # -> warning
job = pr.create.job.MyJob(('dots', 'are', 'replaced')) # -> no warning, because this is a feature
One compromise I can see is to issue a warning if
str
is set instead oftuple
orlist
and a conversion takes place, i.e.:
Perfectly happy with this, since in the second case equal tuples would be mapped to equal str and the failure case I mentioned cannot happen.
So, this issue still persists for with renamed jobs that contain a dash. The jobs internally renamed themselves, but the working directory and hdf5 file not. Can we not just allow arbitrary job names and stop this?
Related bug: master jobs don't seem to check if they create a child job with a name longer than the character limit, which then triggers an error in the database code which kills the whole master.
Traceback:
Traceback (most recent call last):
File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1900, in _execute_context
self.dialect.do_execute(
File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
cursor.execute(statement, parameters)
psycopg2.errors.StringDataRightTruncation: value too long for type character varying(50)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/u/zora/software/pyiron_base/pyiron_base/database/generic.py", line 915, in add_item_dict
result = self.conn.execute(
File "/u/zora/software/pyiron_base/pyiron_base/database/generic.py", line 431, in execute
return retry(
File "/u/zora/software/pyiron_base/pyiron_base/utils/error.py", line 145, in retry
return func()
File "/u/zora/software/pyiron_base/pyiron_base/database/generic.py", line 432, in <lambda>
lambda: self.execute_once(*args, **kwargs),
File "/u/zora/software/pyiron_base/pyiron_base/database/generic.py", line 425, in execute_once
return self._conn.execute(*args, **kwargs)
File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packages/sqlalchemy/future/engine.py", line 280, in execute
return self._execute_20(
File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1705, in _execute_20
return meth(self, args_10style, kwargs_10style, execution_options)
File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 334, in _execute_on_connection
return connection._execute_clauseelement(
File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1572, in _execute_clauseelement
ret = self._execute_context(
File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1943, in _execute_context
self._handle_dbapi_exception(
File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2124, in _handle_dbapi_exception
util.raise_(
File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
raise exception
File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1900, in _execute_context
self.dialect.do_execute(
File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.DataError: (psycopg2.errors.StringDataRightTruncation) value too long for type character varying(50)
[SQL: INSERT INTO jobs_cmmc (parentid, masterid, projectpath, project, job, subjob, chemicalformula, status, hamilton, hamversion, username, computer, timestart) VALUES (%(parentid)s, %(masterid)s, %(projectpath)s, %(project)s, %(job)s, %(subjob)s, %(chemicalformula)s, %(status)s, %(hamilton)s, %(hamversion)s, %(username)s, %(computer)s, %(timestart)s) RETURNING jobs_cmmc.id]
[parameters: {'parentid': None, 'masterid': 19564337, 'projectpath': '/cmmc/u/', 'project': 'zora/pyiron/projects/2023/MTP_MgAl_V2/verification/volume_flow/mtp/EverythingLayerNNTruncated/MTP24_1_8_6_5_restart_murn_mp_998860_hdf5/', 'job': 'MTP24_1_8_6_5_restart_murn_mp_998860_0_30000000000000004', 'subjob': '/MTP24_1_8_6_5_restart_murn_mp_998860_0_30000000000000004', 'chemicalformula': 'Al', 'status': 'initialized', 'hamilton': 'LammpsMlip', 'hamversion': '2021.09.29_default', 'username': 'zora', 'computer': 'zora@cmti001#1', 'timestart': datetime.datetime(2023, 4, 13, 12, 0, 24, 557393)}]
(Background on this error at: https://sqlalche.me/e/14/9h9h)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/u/zora/software/pyiron_base/pyiron_base/cli/__main__.py", line 3, in <module>
main()
File "/u/zora/software/pyiron_base/pyiron_base/cli/control.py", line 59, in main
args.cli(args)
File "/u/zora/software/pyiron_base/pyiron_base/cli/wrapper.py", line 31, in main
job_wrapper_function(
File "/u/zora/software/pyiron_base/pyiron_base/jobs/job/wrapper.py", line 171, in job_wrapper_function
job.run()
File "/u/zora/software/pyiron_base/pyiron_base/jobs/job/wrapper.py", line 124, in run
self.job.run_static()
File "/u/zora/software/pyiron_base/pyiron_base/jobs/master/parallel.py", line 544, in run_static
self._run_if_master_modal_child_non_modal(job=job)
File "/u/zora/software/pyiron_base/pyiron_base/jobs/master/parallel.py", line 499, in _run_if_master_modal_child_non_modal
job.save()
File "/u/zora/software/pyiron_base/pyiron_base/jobs/job/generic.py", line 1084, in save
job_id = self.project.db.add_item_dict(self.db_entry())
File "/u/zora/software/pyiron_base/pyiron_base/database/generic.py", line 923, in add_item_dict
raise ValueError("Error occurred: " + str(except_msg))
ValueError: Error occurred: (psycopg2.errors.StringDataRightTruncation) value too long for type character varying(50)
[SQL: INSERT INTO jobs_cmmc (parentid, masterid, projectpath, project, job, subjob, chemicalformula, status, hamilton, hamversion, username, computer, timestart) VALUES (%(parentid)s, %(masterid)s, %(projectpath)s, %(project)s, %(job)s, %(subjob)s, %(chemicalformula)s, %(status)s, %(hamilton)s, %(hamversion)s, %(username)s, %(computer)s, %(timestart)s) RETURNING jobs_cmmc.id]
[parameters: {'parentid': None, 'masterid': 19564337, 'projectpath': '/cmmc/u/', 'project': 'zora/pyiron/projects/2023/MTP_MgAl_V2/verification/volume_flow/mtp/EverythingLayerNNTruncated/MTP24_1_8_6_5_restart_murn_mp_998860_hdf5/', 'job': 'MTP24_1_8_6_5_restart_murn_mp_998860_0_30000000000000004', 'subjob': '/MTP24_1_8_6_5_restart_murn_mp_998860_0_30000000000000004', 'chemicalformula': 'Al', 'status': 'initialized', 'hamilton': 'LammpsMlip', 'hamversion': '2021.09.29_default', 'username': 'zora', 'computer': 'zora@cmti001#1', 'timestart': datetime.datetime(2023, 4, 13, 12, 0, 24, 557393)}]
(Background on this error at: https://sqlalche.me/e/14/9h9h)