Arina reader
Description of the change
Adding a reader for Arina data
Progress of the PR
- [x] Adding arina reader
- [x] update docstring
- [ ] update user guide
- [ ] add a changelog entry in the
upcoming_changesfolder (seeupcoming_changes/README.rst), - [ ] Check formatting of the changelog entry (and eventual user guide changes) in the
docs/readthedocs.org:rosettasciiobuild of this PR (link in github checks) - [x] add tests
- [ ] ready for review
Minimal example of the bug fix or the new feature
from rsciio.arina import file_reader
file_reader("your_arina_file.h5")
Codecov Report
:x: Patch coverage is 90.00000% with 8 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 87.89%. Comparing base (1ef8be0) to head (a6ea074).
:warning: Report is 143 commits behind head on main.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| rsciio/arina/_api.py | 89.47% | 5 Missing and 3 partials :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #398 +/- ##
==========================================
+ Coverage 87.86% 87.89% +0.03%
==========================================
Files 87 89 +2
Lines 11335 11467 +132
Branches 2095 2120 +25
==========================================
+ Hits 9959 10079 +120
- Misses 872 878 +6
- Partials 504 510 +6
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Thanks for the helpful suggestions @ericpre!
What is your preference for adding dependencies? The arina reader needs both hdf5plugin and the h5py, which is a little different than the existing h5py option. I think this is also causing some of the errors for the pytests.
To fix some of the failure in the test suite, this branch will need to be rebased on the main branch. On the topic of rebasing, are you planning to use the file rsciio/tests/data/arina/MEA_ptycho_15_data_000001.h5? If not, it would be good to remove it from the git history to avoid unnecessarily increasing the repository size. I am happy to do it if this helps.
@ericpre if you wanted to do the rebasing that would be great! Unfortunately all 3 files are needed for the arina reader. Each dataset has a master file, at least one data file (although in bigger scans I can end up with more than 100 files), and then sometime an additional un-numbered file with metadata.
Is it possible to have smaller test file that would test the same functionalities? See https://hyperspy.org/rosettasciio/contributing.html#making-test-data-files for more details.
I rebased (and force-push to your branch) and pushed a commit of fix the tests that were failing because of the added optional dependencies. You will need to update your branch locally from your branch on GitHub before pushing any more commits.
This is fantastic! Thank you so much @ericpre! I replaced the test file to a much smaller one.
One other comment. I think that Dectris does all of their compression on GPU. This is nice for the speed of compression but requires that you have very small chunks as a trade off. As a result loading data on CPU might be much slower than it should be, mostly due to some limitations with the HDF5 library. It might be worth benchmarking and trying to improve how fast we can load this data. That or implement a loading scheme that uses a GPU to decompress the data if available.
I think that #377 should help with this but I need to spend a good chunk of time determining exactly how to speed that up.
These are great suggestions, thanks! I would be interested in these features in the future. I tried to copy the other readers and add the relevant arina information to the docs. I wasn't sure about a few things, so feel free to send any more feedback! By the way -- is there a way to build the docs locally to check them before pushing?
These are great suggestions, thanks! I would be interested in these features in the future. I tried to copy the other readers and add the relevant arina information to the docs. I wasn't sure about a few things, so feel free to send any more feedback!
Perfect!
By the way -- is there a way to build the docs locally to check them before pushing?
Yes! Although sometimes it is a bit of a pain to set up. If its a small change often I just use the CI. The docs are all built using sphinx, I think we should actually improve the documentation on this in Hyperspy. I thought that infromation was there but it might not be. This is a very basic guide from pyxem.
@ericpre correct me if I'm wrong but you should be able to:
cd rosetasciio/
pip install -e '.[doc]'
cd doc/
make SPHINXOPTS=-W --keep-going html
That will use the make file in the doc folder and then build the documentation in the build directory. You should be able to view the built webpages locally by opening any of those files.
The -W is for failing on warnings and the --keep-going means fail but continue so that you can have multiple failures.
Sometimes it can get complicated and the errors can get a little esoteric, so feel free to ask questions if you need help!
I'll also note that I'm an avid user of the git squash command (for my own work). I use that a bunch if I happen to make a mistake, push it and then have to make another commit to fix something. It helps clean up your git history.
I think Linus is a little too strict in most cases (I like people to make PR's way before they are merged) but I always reference this as a good starting point for understanding what to do (and not to do) with your git history.
https://www.mail-archive.com/[email protected]/msg39091.html
@smribet: In the interest of finishing this PR, I went ahead by committing @CSSFrancis's suggestion, tweaking the documentation, adding the changelog entry and merging.
Congratulation on your first PR to rosettasciio! :)
Very exciting! Thanks for all of your help along the way!