pyiron_base
                                
                                
                                
                                    pyiron_base copied to clipboard
                            
                            
                            
                        Let's spin off DataContainer as its own package
EDIT: Note: Formerly "Should DataContainer get spun off to its own package?"
In my personal projects I sometimes find myself wishing I had DataContainer available, or even adding the entire pyiron_base dependency stack to my project just so I can use it. ...Does it make sense to spin this off as its own smaller package ala pysqa?
I took a quick peek at the source, and we'd need to pull out some of our other stuff in interfaces and storage, but mostly it's not too bad; e.g. interfaces.has_groups is only dependent on abc and could just be pulled straight out. The only sticky part I can see is that HasHDF depends on ProjectHDFio. My gut says we can probably switch this to dependency on FileHDFio and then extract that pretty cleanly, but I haven't dug deep in.
Would anyone else find this helpful/do you think it would be helpful for the community? @pmrv do you have any thoughts on the technical side? IMO DataContainer is just phenomenally useful and it would be great to make it more usable by a wider community.
Steps to a pyiron_data package (edits welcome!):
- [ ] pull the "pyiron" stuff out of 
FileHDFio(put it inProjectHDFio?) - [ ] Make sure that everything currently depending on 
hdf: ProjectHDFiocan depend onhdf: FileHDFio(HasHDFshould be the key, after that children likeDataContainerhopefully fall in line - [ ] Move key classes over to their own package, and re-import them in pyiron
MutableMappingHasGroupsFileHDFioHasHDF,_WithHDFHasStorageDataContainerHasStoredTraits
 
Now that I'm doing some GUI stuff, I'm also pretty keen on good callback infrastructure. While it was disappointing that traitlets can't handle events for mutable data types, maybe we could integrate DataContainer with something like the spectate module to give a one-stop-shop for serializable state management?
I like the idea of releasing the HDF5 interface represented by the DataContainer as a separate package.
This would include the FileHDFio and the DataContainer as well as the HasGroups concept? The ProjectHDFio would stay in base? I am open to such a change in the infrastructure.
Talking about this, I would also like to spin off the pyiron tables module, as an abstract package to apply map-reduce on any kind of files. If anybody is interested in working on this, I am still looking for volunteers. During the spin off you can learn about packaging and continuous integration for testing and so on.
Fun idea, though I won't have time to work on it actively.  A complication I can see wrt to pulling FileHdfIO out is that it'll make backwards compatibility on our side harder. We don't want pyiron specific code for it in a separate module, but doing in pyiron_base will be hard if the class we're using is defined elsewhere.  We'll want to refactor FileHdfIO a bit more until all the pyiron specific bits are gone and then it should be possible.  I'll need to look at the code a bit more though.
Nice, very favourable responses. I'll change this from a discussion to a request. I also don't have time immediately, but am open to being the one to do the work in the future. (If you are reading this and keen to get it done, go for it! You will not be stepping on my toes 😂)
Now that I'm doing some GUI stuff, I'm also pretty keen on good callback infrastructure. While it was disappointing that traitlets can't handle events for mutable data types, maybe we could integrate
DataContainerwith something like thespectatemodule to give a one-stop-shop for serializable state management?
I haven't looked at spectate, so mutable traits are still an issue, but storage and traits are now combined over in #862