pyiron_base
pyiron_base copied to clipboard
Suggestions and problems of `pack` and `unpack`
List of problems/suggestions regarding pack
and unpack
:
- The following line deletes the original project folder.
pr.pack(pr.name)
- The following code imports jobs multiple times without warnings/errors:
pr.pack('TEST')
pr.unpack('TEST')
pr.unpack('TEST')
-
csv-file should be included inside the tar-file.
-
Definition of
destination path
:
Currently, destination_path
has to be specified, while we think mostly people want to quickly pack and unpack project to download it. -> Sam's suggestion:
class Project(...):
def pack(self, destination_path=None, csv_file_name=None, compress=True):
if destination_path is None:
destination_path = self.name
if csv_file_name is None: # Only if csv has to be exported separately
csv_file_name = self.name
...
-
unpack
as@classmethod
:
class Project(...):
@classmethod
def unpack(origin_path, project_name=None, csv_file_name=None):
if project_name is None and not origin_path.endswith('tar.gz'):
raise ValueError('`project_name` must be specified for uncompressed projects')
if project_name == origin_path:
raise ValueError('`project_name` must be different from `origin_path`')
...
So that the user can do:
from pyiron import Project
pr = Project.unpack('PACKED_PROJECT.tar.gz')
-
import_project
to import jobs into the current project:
class Project(...):
def import_project(self, origin_path, csv_file_name=None):
So that the user can do:
from pyiron import Project
pr = Project('MY_PROJECT')
pr.import_project('PACKED_PROJECT.tar.gz') # This line imports all jobs inside PACKED_PROJECT.tar.gz
Comment on the last one (import_project
):
In principle the project should be able to have other jobs such as:
from pyiron import Project
pr = Project('TEST')
vasp = pr.create.job.vasp('test')
vasp.structure = pr.create.structure.bulk('Al', cubic=True)
vasp.run()
pr.import_project('PACKED_PROJECT.tar.gz')
When there is a conflict of job names: In the first step, pyiron should raise an error. At some point we should think about how to import new stuff intelligently
It's currently not possible to pack project in a different directory from where it's located, i.e.
pr = Project('/a/b/c')
pr.pack('my_archive')
raises an error that cwd/c
does not exist.
Thanks for reporting. I.e. the path is assumed to be cwd/project.name which is obviously not always the case and, thus, a bug.
Packing/Unpacking also does not seem to take care of files related to the jobs, i.e. it only saves the HDF5 files. That's a deal breaker imo. The more I think about this, the more I think we should just rewrite the whole thing..
Already the comments from above and the few line of codes these methods have call for a full rewrite.
I'm starting to have the feeling that there shouldn't be any need for unpack
. It should rather be:
from pyiron import Project
pr = Project('PROJECT', unpack=True)
And it should raise an error if PROJECT.tar.gz
does not exist.
I would be fine with that as well - This would then also rename the project in the tar archive to 'PROJECT'? Or would this be rather
pr = Project('path/to/PROJECT.tar.gz', unpack=True)
Indeed, how would one handle the
pr = Project('path/to/new/PROJECT')
in combination with the tar archive in this case?
pr = Project('/path/to/new/PROJECT', unpack='path/to/tar')
could be a solution.
In the discussion that we had at the pyiron meeting today, I/we realized that it might be a good idea to be able to pack some jobs without creating a new group, i.e.:
pr.pack(path=..., job_list=[job_one, job_two, job_three])
And job_list
should be None
by default. This would allow the user to export for example only TrainingContainer
.
Alternatively, the job should have the functionality to be exported directly:
job.pack(path=...)
As far as I understand @samwaseda addressed these suggestions and problems in the pull requests above. So I am going to close the issue.