pyiron_base
pyiron_base copied to clipboard
Let's spin off DataContainer as its own package
EDIT: Note: Formerly "Should DataContainer get spun off to its own package?"
In my personal projects I sometimes find myself wishing I had DataContainer
available, or even adding the entire pyiron_base
dependency stack to my project just so I can use it. ...Does it make sense to spin this off as its own smaller package ala pysqa
?
I took a quick peek at the source, and we'd need to pull out some of our other stuff in interfaces
and storage
, but mostly it's not too bad; e.g. interfaces.has_groups
is only dependent on abc
and could just be pulled straight out. The only sticky part I can see is that HasHDF
depends on ProjectHDFio
. My gut says we can probably switch this to dependency on FileHDFio
and then extract that pretty cleanly, but I haven't dug deep in.
Would anyone else find this helpful/do you think it would be helpful for the community? @pmrv do you have any thoughts on the technical side? IMO DataContainer
is just phenomenally useful and it would be great to make it more usable by a wider community.
Steps to a pyiron_data
package (edits welcome!):
- [ ] pull the "pyiron" stuff out of
FileHDFio
(put it inProjectHDFio
?) - [ ] Make sure that everything currently depending on
hdf: ProjectHDFio
can depend onhdf: FileHDFio
(HasHDF
should be the key, after that children likeDataContainer
hopefully fall in line - [ ] Move key classes over to their own package, and re-import them in pyiron
-
MutableMapping
-
HasGroups
-
FileHDFio
-
HasHDF
,_WithHDF
-
HasStorage
-
DataContainer
-
HasStoredTraits
-
Now that I'm doing some GUI stuff, I'm also pretty keen on good callback infrastructure. While it was disappointing that traitlets can't handle events for mutable data types, maybe we could integrate DataContainer
with something like the spectate
module to give a one-stop-shop for serializable state management?
I like the idea of releasing the HDF5 interface represented by the DataContainer as a separate package.
This would include the FileHDFio and the DataContainer as well as the HasGroups concept? The ProjectHDFio would stay in base? I am open to such a change in the infrastructure.
Talking about this, I would also like to spin off the pyiron tables module, as an abstract package to apply map-reduce on any kind of files. If anybody is interested in working on this, I am still looking for volunteers. During the spin off you can learn about packaging and continuous integration for testing and so on.
Fun idea, though I won't have time to work on it actively. A complication I can see wrt to pulling FileHdfIO
out is that it'll make backwards compatibility on our side harder. We don't want pyiron specific code for it in a separate module, but doing in pyiron_base will be hard if the class we're using is defined elsewhere. We'll want to refactor FileHdfIO
a bit more until all the pyiron specific bits are gone and then it should be possible. I'll need to look at the code a bit more though.
Nice, very favourable responses. I'll change this from a discussion to a request. I also don't have time immediately, but am open to being the one to do the work in the future. (If you are reading this and keen to get it done, go for it! You will not be stepping on my toes 😂)
Now that I'm doing some GUI stuff, I'm also pretty keen on good callback infrastructure. While it was disappointing that traitlets can't handle events for mutable data types, maybe we could integrate
DataContainer
with something like thespectate
module to give a one-stop-shop for serializable state management?
I haven't looked at spectate
, so mutable traits are still an issue, but storage and traits are now combined over in #862