pyiron_base icon indicating copy to clipboard operation
pyiron_base copied to clipboard

Let's spin off DataContainer as its own package

Open liamhuber opened this issue 2 years ago • 7 comments

EDIT: Note: Formerly "Should DataContainer get spun off to its own package?"

In my personal projects I sometimes find myself wishing I had DataContainer available, or even adding the entire pyiron_base dependency stack to my project just so I can use it. ...Does it make sense to spin this off as its own smaller package ala pysqa?

I took a quick peek at the source, and we'd need to pull out some of our other stuff in interfaces and storage, but mostly it's not too bad; e.g. interfaces.has_groups is only dependent on abc and could just be pulled straight out. The only sticky part I can see is that HasHDF depends on ProjectHDFio. My gut says we can probably switch this to dependency on FileHDFio and then extract that pretty cleanly, but I haven't dug deep in.

Would anyone else find this helpful/do you think it would be helpful for the community? @pmrv do you have any thoughts on the technical side? IMO DataContainer is just phenomenally useful and it would be great to make it more usable by a wider community.

Steps to a pyiron_data package (edits welcome!):

  • [ ] pull the "pyiron" stuff out of FileHDFio (put it in ProjectHDFio?)
  • [ ] Make sure that everything currently depending on hdf: ProjectHDFio can depend onhdf: FileHDFio (HasHDF should be the key, after that children like DataContainer hopefully fall in line
  • [ ] Move key classes over to their own package, and re-import them in pyiron
    • MutableMapping
    • HasGroups
    • FileHDFio
    • HasHDF, _WithHDF
    • HasStorage
    • DataContainer
    • HasStoredTraits

liamhuber avatar Sep 27 '22 16:09 liamhuber

Now that I'm doing some GUI stuff, I'm also pretty keen on good callback infrastructure. While it was disappointing that traitlets can't handle events for mutable data types, maybe we could integrate DataContainer with something like the spectate module to give a one-stop-shop for serializable state management?

liamhuber avatar Sep 27 '22 16:09 liamhuber

I like the idea of releasing the HDF5 interface represented by the DataContainer as a separate package.

jan-janssen avatar Sep 27 '22 16:09 jan-janssen

This would include the FileHDFio and the DataContainer as well as the HasGroups concept? The ProjectHDFio would stay in base? I am open to such a change in the infrastructure.

niklassiemer avatar Sep 27 '22 18:09 niklassiemer

Talking about this, I would also like to spin off the pyiron tables module, as an abstract package to apply map-reduce on any kind of files. If anybody is interested in working on this, I am still looking for volunteers. During the spin off you can learn about packaging and continuous integration for testing and so on.

jan-janssen avatar Sep 29 '22 15:09 jan-janssen

Fun idea, though I won't have time to work on it actively. A complication I can see wrt to pulling FileHdfIO out is that it'll make backwards compatibility on our side harder. We don't want pyiron specific code for it in a separate module, but doing in pyiron_base will be hard if the class we're using is defined elsewhere. We'll want to refactor FileHdfIO a bit more until all the pyiron specific bits are gone and then it should be possible. I'll need to look at the code a bit more though.

pmrv avatar Sep 29 '22 16:09 pmrv

Nice, very favourable responses. I'll change this from a discussion to a request. I also don't have time immediately, but am open to being the one to do the work in the future. (If you are reading this and keen to get it done, go for it! You will not be stepping on my toes 😂)

liamhuber avatar Sep 29 '22 17:09 liamhuber

Now that I'm doing some GUI stuff, I'm also pretty keen on good callback infrastructure. While it was disappointing that traitlets can't handle events for mutable data types, maybe we could integrate DataContainer with something like the spectate module to give a one-stop-shop for serializable state management?

I haven't looked at spectate, so mutable traits are still an issue, but storage and traits are now combined over in #862

liamhuber avatar Oct 18 '22 19:10 liamhuber