pyiron_base
pyiron_base copied to clipboard
Implement a `from_file` and `to_file` or something similar for `DataContainers` to avoid prior-knowledge requirement to avoid bugs
Just so there's a public record of this to-do
Implement a from_file
and to_file
that outputs to hdf5 file format for DataContainers
and other data objects (tables?) to avoid prior-knowledge requirement to avoid known bugs with users trying to output to file formats (e.g. json) that aren't supported by ase
and pyiron
atoms.
Not sure I follow. Is this about exporting DataContainer
to general text formats or to (non-pyiron managed) HDF5 files?
If I understood correctly, the idea is to ease sharing of pyiron data structures, i.e. to create non-pyiron-managed hdf files which can be reloaded by others.
I would image a function like
def to_file(filename, group_name=None):
group_name = group_name or self.__class__.__name__
hdf = FileHDFio(filename...)
self._to_hdf(hdf, group_name)
Where the _to_hdf
does not change the group or hdf of the job itself.
This is about exporting datacontainer in a predetermined format for users, instead of making users attempt to write the dataframe (DataContainer) to a disk file themselves, which can trigger confusing errors that we know have to do with the Atoms objects in the DataContainer (non-jsonable, non-pickleable (supposed to be fixed?)). So essentially this can be found in tab-complete and pyiron writes the DataContainer to hdf5, and obviously there should be a pair-function for reading this hdf5 file. So I envision something like DataContainer.to_file()
which generates a hdf5 file for sharing and DataContainer.from_file("defaultname.hdf5")
.
The purpose is to smoothen out the process (and really standardise it) for datasharing, since everyone is now sharing hdf5 files and not some with json, some with pickle etc.
This would remove the problem that users face that they have to know that hdf5 is the recommended filesharing format for this object because of bugs with Atoms objects which trigger when trying to write the DataContainer to json, and pickle.
Super simple fix - I'll try to get something together this week.
And yep, @niklassiemer is on the money
Ah ok, this would be covered by #847 I guess.