orange3 icon indicating copy to clipboard operation
orange3 copied to clipboard

Add GenericHDF5Reader

Open gjover opened this issue 1 year ago • 14 comments

Issue

Need for a generic HDF5 Reader that reads a dataset of choice from an hdf5 file.

Description of changes

Add GenericHDF5Reader Add an options box in OWFile to allow readers to define additional options

Includes
  • [X] Code changes
  • [ ] Tests
  • [ ] Documentation

gjover avatar Feb 28 '23 21:02 gjover

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
7 out of 8 committers have signed the CLA.

:white_check_mark: nikicc
:white_check_mark: lanzagar
:white_check_mark: PrimozGodec
:white_check_mark: markotoplak
:white_check_mark: ajdapretnar
:white_check_mark: gjover
:white_check_mark: stellarpower
:x: Mireia Fernandez Fernandez


Mireia Fernandez Fernandez seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar Feb 28 '23 21:02 CLAassistant

Hello @gjover, hello to Alba!

Could you also provide some differently structure hfd5s for testing? If you can not share them publicly, feel free to email them.

In the future, this addition will also need some test files that we will keep in the repository for automated testing, but those should be prepared with special care, so they are very small.

markotoplak avatar Mar 03 '23 09:03 markotoplak

/rebase

janezd avatar Apr 21 '23 07:04 janezd

/rebase

markotoplak avatar Jan 18 '24 11:01 markotoplak

Thanks for the contribution. I did not test, so I only commented on the code structure.

As you saw when implementing this, the current File is reader-agnostic. We tried making it such that it does not know anything about specific formats, and we would like to keep it that way.

Therefore, any reader-specific interface should be packaged with the reader. A reader could define a variable that lists the class that defines UI for that specific reader. Then, OWFile could check whether that variable exists; if it does, it could call that UI. That UI class should return reader properties that should then be passed on to the reader class when the actual reading is done (and, of course, stored in OWFile settings).

So, for HDF5 support, any addition to OWFile should be tiny and only call what is specified in the reader class (in practice, the contents of _hdf5_tree_model should be moved and restructured). This function separation is very important to us because it helps keep the code maintainable.

markotoplak avatar Jan 18 '24 11:01 markotoplak

Also, please rebase your branch to the current orange3/master so that there are no merge commits in the PR. If you have problems with that, we can do it for you, but then you'll have to carefully synchronize your checkout-out branch.

markotoplak avatar Jan 18 '24 11:01 markotoplak

I am thinking about how to go forward. One option is that you propose how complex readers should interact with file reading widgets (in general, not OWFile specific), either in code or as a spec. Or, perhaps more efficiently, we could structure the interoperation between readers of reading widgets together, and then you could implement just the reader part following that interface while we implement the OWFile part.

markotoplak avatar Jan 18 '24 12:01 markotoplak

Sure, I agree.

As commented before, we added a hidden options_box for these readers that may use them. What about to check if the reader has the method _set_options_box, and call this Reader method in such a case?

In that way OWFile would keep agnostic but would allow the Reader to set the options box when needed.

gjover avatar Jan 25 '24 11:01 gjover

I have remove the option box and use the sheet mechanism instead.

gjover avatar Jan 29 '24 11:01 gjover

Codecov Report

Merging #6356 (8520bb9) into master (7bdc8e4) will decrease coverage by 0.04%. Report is 1 commits behind head on master. The diff coverage is 42.22%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6356      +/-   ##
==========================================
- Coverage   88.13%   88.10%   -0.04%     
==========================================
  Files         322      322              
  Lines       70577    70619      +42     
==========================================
+ Hits        62206    62218      +12     
- Misses       8371     8401      +30     

codecov[bot] avatar Feb 14 '24 06:02 codecov[bot]

I tried this on a few files, it seems to work well as a fallback reader. I did find it odd that the indexes of multidimensional datasets end up in features, in orange-spectroscopy we usually put this into Metas.

I squashed the history into two commits for easier reading: https://github.com/stuart-cls/orange3/tree/generic-hdf5-squash . I don't think the owfile.py changes are actually used in the latest code.

@markotoplak I don't want to hijack this PR too much, but I tried to combine this with an implementation of the Orange on-disk format from dask to see if we could have subclasses register that they are better able to handle a particular HDF5 format (we have a few of these in orange-spectroscopy). See orange-hdf5 and base-hdf5

It works, but this approach is flawed because you lose (at least) the sheets functionality on your subclasses, which is a problem for these metareaders in orange-spectroscopy as well.

stuart-cls avatar Mar 07 '24 23:03 stuart-cls

@stuart-cls, could you open a PR with your orange-hdf5 reader/writer? I checked it quickly, and it looks good. We do need something similar in the distro. I also like that you handle variable-length strings (I have no idea how I missed what you did when I was programming that part for Dask). Thanks!

markotoplak avatar Apr 25 '24 13:04 markotoplak

@gjover, just to double check: your changes here in the File widget are not necessary for the reader, right? If so, let's merge this reader into github.com/quasars/orange-spectroscopy for now. Could you open a PR there, please? If you don't have time, I can take your change set and open the PR myself.

markotoplak avatar Apr 25 '24 13:04 markotoplak

After we have these parts, we can also think about the meta reader that Stuart suggested.

markotoplak avatar Apr 25 '24 13:04 markotoplak