pyiron_base icon indicating copy to clipboard operation
pyiron_base copied to clipboard

Suggestions and problems of `pack` and `unpack`

Open niklassiemer opened this issue 2 years ago • 8 comments

List of problems/suggestions regarding pack and unpack:

  1. The following line deletes the original project folder.
pr.pack(pr.name)
  1. The following code imports jobs multiple times without warnings/errors:
pr.pack('TEST')
pr.unpack('TEST')
pr.unpack('TEST')
  1. csv-file should be included inside the tar-file.

  2. Definition of destination path:

Currently, destination_path has to be specified, while we think mostly people want to quickly pack and unpack project to download it. -> Sam's suggestion:

class Project(...):

    def pack(self, destination_path=None, csv_file_name=None, compress=True):
        if destination_path is None:
            destination_path = self.name
        if csv_file_name is None: # Only if csv has to be exported separately
            csv_file_name = self.name
    ...
  1. unpack as @classmethod:
class Project(...):
    @classmethod
    def unpack(origin_path, project_name=None, csv_file_name=None):
        if project_name is None and not origin_path.endswith('tar.gz'):
            raise ValueError('`project_name` must be specified for uncompressed projects')
        if project_name == origin_path:
            raise ValueError('`project_name` must be different from `origin_path`')
        ...

So that the user can do:

from pyiron import Project
pr = Project.unpack('PACKED_PROJECT.tar.gz')
  1. import_project to import jobs into the current project:
class Project(...):
    def import_project(self, origin_path, csv_file_name=None):

So that the user can do:

from pyiron import Project
pr = Project('MY_PROJECT')
pr.import_project('PACKED_PROJECT.tar.gz') # This line imports all jobs inside PACKED_PROJECT.tar.gz

niklassiemer avatar Jun 21 '22 08:06 niklassiemer

Comment on the last one (import_project):

In principle the project should be able to have other jobs such as:

from pyiron import Project
pr = Project('TEST')
vasp = pr.create.job.vasp('test')
vasp.structure = pr.create.structure.bulk('Al', cubic=True)
vasp.run()
pr.import_project('PACKED_PROJECT.tar.gz')

When there is a conflict of job names: In the first step, pyiron should raise an error. At some point we should think about how to import new stuff intelligently

samwaseda avatar Jun 21 '22 09:06 samwaseda

It's currently not possible to pack project in a different directory from where it's located, i.e.

pr = Project('/a/b/c')
pr.pack('my_archive')

raises an error that cwd/c does not exist.

pmrv avatar Jun 29 '22 05:06 pmrv

Thanks for reporting. I.e. the path is assumed to be cwd/project.name which is obviously not always the case and, thus, a bug.

niklassiemer avatar Jun 29 '22 08:06 niklassiemer

Packing/Unpacking also does not seem to take care of files related to the jobs, i.e. it only saves the HDF5 files. That's a deal breaker imo. The more I think about this, the more I think we should just rewrite the whole thing..

pmrv avatar Jul 14 '22 14:07 pmrv

Already the comments from above and the few line of codes these methods have call for a full rewrite.

niklassiemer avatar Jul 14 '22 15:07 niklassiemer

I'm starting to have the feeling that there shouldn't be any need for unpack. It should rather be:

from pyiron import Project
pr = Project('PROJECT', unpack=True)

And it should raise an error if PROJECT.tar.gz does not exist.

samwaseda avatar Oct 21 '22 09:10 samwaseda

I would be fine with that as well - This would then also rename the project in the tar archive to 'PROJECT'? Or would this be rather

pr = Project('path/to/PROJECT.tar.gz', unpack=True)

Indeed, how would one handle the

pr = Project('path/to/new/PROJECT')

in combination with the tar archive in this case?

pr = Project('/path/to/new/PROJECT', unpack='path/to/tar')

could be a solution.

niklassiemer avatar Oct 21 '22 10:10 niklassiemer

In the discussion that we had at the pyiron meeting today, I/we realized that it might be a good idea to be able to pack some jobs without creating a new group, i.e.:

pr.pack(path=..., job_list=[job_one, job_two, job_three])

And job_list should be None by default. This would allow the user to export for example only TrainingContainer.

Alternatively, the job should have the functionality to be exported directly:

job.pack(path=...)

samwaseda avatar Oct 24 '22 13:10 samwaseda

As far as I understand @samwaseda addressed these suggestions and problems in the pull requests above. So I am going to close the issue.

jan-janssen avatar Sep 29 '24 06:09 jan-janssen