orange3
orange3 copied to clipboard
Add GenericHDF5Reader
Issue
Need for a generic HDF5 Reader that reads a dataset of choice from an hdf5 file.
Description of changes
Add GenericHDF5Reader Add an options box in OWFile to allow readers to define additional options
Includes
- [X] Code changes
- [ ] Tests
- [ ] Documentation
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
7 out of 8 committers have signed the CLA.
:white_check_mark: nikicc
:white_check_mark: lanzagar
:white_check_mark: PrimozGodec
:white_check_mark: markotoplak
:white_check_mark: ajdapretnar
:white_check_mark: gjover
:white_check_mark: stellarpower
:x: Mireia Fernandez Fernandez
Mireia Fernandez Fernandez seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.
Hello @gjover, hello to Alba!
Could you also provide some differently structure hfd5s for testing? If you can not share them publicly, feel free to email them.
In the future, this addition will also need some test files that we will keep in the repository for automated testing, but those should be prepared with special care, so they are very small.
/rebase
/rebase
Thanks for the contribution. I did not test, so I only commented on the code structure.
As you saw when implementing this, the current File is reader-agnostic. We tried making it such that it does not know anything about specific formats, and we would like to keep it that way.
Therefore, any reader-specific interface should be packaged with the reader. A reader could define a variable that lists the class that defines UI for that specific reader. Then, OWFile could check whether that variable exists; if it does, it could call that UI. That UI class should return reader properties that should then be passed on to the reader class when the actual reading is done (and, of course, stored in OWFile settings).
So, for HDF5 support, any addition to OWFile should be tiny and only call what is specified in the reader class (in practice, the contents of _hdf5_tree_model
should be moved and restructured). This function separation is very important to us because it helps keep the code maintainable.
Also, please rebase your branch to the current orange3/master so that there are no merge commits in the PR. If you have problems with that, we can do it for you, but then you'll have to carefully synchronize your checkout-out branch.
I am thinking about how to go forward. One option is that you propose how complex readers should interact with file reading widgets (in general, not OWFile specific), either in code or as a spec. Or, perhaps more efficiently, we could structure the interoperation between readers of reading widgets together, and then you could implement just the reader part following that interface while we implement the OWFile part.
Sure, I agree.
As commented before, we added a hidden options_box for these readers that may use them. What about to check if the reader has the method _set_options_box, and call this Reader method in such a case?
In that way OWFile would keep agnostic but would allow the Reader to set the options box when needed.
I have remove the option box and use the sheet mechanism instead.
Codecov Report
Merging #6356 (8520bb9) into master (7bdc8e4) will decrease coverage by
0.04%
. Report is 1 commits behind head on master. The diff coverage is42.22%
.
Additional details and impacted files
@@ Coverage Diff @@
## master #6356 +/- ##
==========================================
- Coverage 88.13% 88.10% -0.04%
==========================================
Files 322 322
Lines 70577 70619 +42
==========================================
+ Hits 62206 62218 +12
- Misses 8371 8401 +30
I tried this on a few files, it seems to work well as a fallback reader. I did find it odd that the indexes of multidimensional datasets end up in features, in orange-spectroscopy we usually put this into Metas.
I squashed the history into two commits for easier reading: https://github.com/stuart-cls/orange3/tree/generic-hdf5-squash . I don't think the owfile.py
changes are actually used in the latest code.
@markotoplak I don't want to hijack this PR too much, but I tried to combine this with an implementation of the Orange on-disk format from dask to see if we could have subclasses register that they are better able to handle a particular HDF5 format (we have a few of these in orange-spectroscopy). See orange-hdf5 and base-hdf5
It works, but this approach is flawed because you lose (at least) the sheets functionality on your subclasses, which is a problem for these metareaders in orange-spectroscopy as well.
@stuart-cls, could you open a PR with your orange-hdf5 reader/writer? I checked it quickly, and it looks good. We do need something similar in the distro. I also like that you handle variable-length strings (I have no idea how I missed what you did when I was programming that part for Dask). Thanks!
@gjover, just to double check: your changes here in the File widget are not necessary for the reader, right? If so, let's merge this reader into github.com/quasars/orange-spectroscopy for now. Could you open a PR there, please? If you don't have time, I can take your change set and open the PR myself.
After we have these parts, we can also think about the meta reader that Stuart suggested.