pyiron_base icon indicating copy to clipboard operation
pyiron_base copied to clipboard

Implement a `from_file` and `to_file` or something similar for `DataContainers` to avoid prior-knowledge requirement to avoid bugs

Open ligerzero-ai opened this issue 2 years ago • 5 comments

Just so there's a public record of this to-do

Implement a from_file and to_file that outputs to hdf5 file format for DataContainers and other data objects (tables?) to avoid prior-knowledge requirement to avoid known bugs with users trying to output to file formats (e.g. json) that aren't supported by ase and pyiron atoms.

ligerzero-ai avatar Nov 07 '22 22:11 ligerzero-ai

Not sure I follow. Is this about exporting DataContainer to general text formats or to (non-pyiron managed) HDF5 files?

pmrv avatar Nov 08 '22 20:11 pmrv

If I understood correctly, the idea is to ease sharing of pyiron data structures, i.e. to create non-pyiron-managed hdf files which can be reloaded by others.

I would image a function like

def to_file(filename, group_name=None):
    group_name = group_name or self.__class__.__name__
    hdf = FileHDFio(filename...)
    self._to_hdf(hdf, group_name)

Where the _to_hdf does not change the group or hdf of the job itself.

niklassiemer avatar Nov 08 '22 21:11 niklassiemer

This is about exporting datacontainer in a predetermined format for users, instead of making users attempt to write the dataframe (DataContainer) to a disk file themselves, which can trigger confusing errors that we know have to do with the Atoms objects in the DataContainer (non-jsonable, non-pickleable (supposed to be fixed?)). So essentially this can be found in tab-complete and pyiron writes the DataContainer to hdf5, and obviously there should be a pair-function for reading this hdf5 file. So I envision something like DataContainer.to_file() which generates a hdf5 file for sharing and DataContainer.from_file("defaultname.hdf5").

The purpose is to smoothen out the process (and really standardise it) for datasharing, since everyone is now sharing hdf5 files and not some with json, some with pickle etc.

This would remove the problem that users face that they have to know that hdf5 is the recommended filesharing format for this object because of bugs with Atoms objects which trigger when trying to write the DataContainer to json, and pickle.

Super simple fix - I'll try to get something together this week.

ligerzero-ai avatar Nov 08 '22 21:11 ligerzero-ai

And yep, @niklassiemer is on the money

ligerzero-ai avatar Nov 08 '22 21:11 ligerzero-ai

Ah ok, this would be covered by #847 I guess.

pmrv avatar Nov 08 '22 21:11 pmrv