rosettasciio icon indicating copy to clipboard operation
rosettasciio copied to clipboard

Arina reader

Open smribet opened this issue 8 months ago • 7 comments

Description of the change

Adding a reader for Arina data

Progress of the PR

  • [x] Adding arina reader
  • [x] update docstring
  • [ ] update user guide
  • [ ] add a changelog entry in the upcoming_changes folder (see upcoming_changes/README.rst),
  • [ ] Check formatting of the changelog entry (and eventual user guide changes) in the docs/readthedocs.org:rosettasciio build of this PR (link in github checks)
  • [x] add tests
  • [ ] ready for review

Minimal example of the bug fix or the new feature

from rsciio.arina import file_reader
file_reader("your_arina_file.h5")

smribet avatar Apr 24 '25 22:04 smribet

Codecov Report

:x: Patch coverage is 90.00000% with 8 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 87.89%. Comparing base (1ef8be0) to head (a6ea074). :warning: Report is 143 commits behind head on main.

Files with missing lines Patch % Lines
rsciio/arina/_api.py 89.47% 5 Missing and 3 partials :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #398      +/-   ##
==========================================
+ Coverage   87.86%   87.89%   +0.03%     
==========================================
  Files          87       89       +2     
  Lines       11335    11467     +132     
  Branches     2095     2120      +25     
==========================================
+ Hits         9959    10079     +120     
- Misses        872      878       +6     
- Partials      504      510       +6     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Apr 25 '25 15:04 codecov[bot]

Thanks for the helpful suggestions @ericpre!

What is your preference for adding dependencies? The arina reader needs both hdf5plugin and the h5py, which is a little different than the existing h5py option. I think this is also causing some of the errors for the pytests.

smribet avatar Apr 27 '25 20:04 smribet

To fix some of the failure in the test suite, this branch will need to be rebased on the main branch. On the topic of rebasing, are you planning to use the file rsciio/tests/data/arina/MEA_ptycho_15_data_000001.h5? If not, it would be good to remove it from the git history to avoid unnecessarily increasing the repository size. I am happy to do it if this helps.

ericpre avatar Apr 30 '25 08:04 ericpre

@ericpre if you wanted to do the rebasing that would be great! Unfortunately all 3 files are needed for the arina reader. Each dataset has a master file, at least one data file (although in bigger scans I can end up with more than 100 files), and then sometime an additional un-numbered file with metadata.

smribet avatar May 01 '25 16:05 smribet

Is it possible to have smaller test file that would test the same functionalities? See https://hyperspy.org/rosettasciio/contributing.html#making-test-data-files for more details.

ericpre avatar May 01 '25 17:05 ericpre

I rebased (and force-push to your branch) and pushed a commit of fix the tests that were failing because of the added optional dependencies. You will need to update your branch locally from your branch on GitHub before pushing any more commits.

ericpre avatar May 02 '25 08:05 ericpre

This is fantastic! Thank you so much @ericpre! I replaced the test file to a much smaller one.

smribet avatar May 02 '25 19:05 smribet

One other comment. I think that Dectris does all of their compression on GPU. This is nice for the speed of compression but requires that you have very small chunks as a trade off. As a result loading data on CPU might be much slower than it should be, mostly due to some limitations with the HDF5 library. It might be worth benchmarking and trying to improve how fast we can load this data. That or implement a loading scheme that uses a GPU to decompress the data if available.

I think that #377 should help with this but I need to spend a good chunk of time determining exactly how to speed that up.

CSSFrancis avatar May 21 '25 12:05 CSSFrancis

These are great suggestions, thanks! I would be interested in these features in the future. I tried to copy the other readers and add the relevant arina information to the docs. I wasn't sure about a few things, so feel free to send any more feedback! By the way -- is there a way to build the docs locally to check them before pushing?

smribet avatar May 26 '25 15:05 smribet

These are great suggestions, thanks! I would be interested in these features in the future. I tried to copy the other readers and add the relevant arina information to the docs. I wasn't sure about a few things, so feel free to send any more feedback!

Perfect!

By the way -- is there a way to build the docs locally to check them before pushing?

Yes! Although sometimes it is a bit of a pain to set up. If its a small change often I just use the CI. The docs are all built using sphinx, I think we should actually improve the documentation on this in Hyperspy. I thought that infromation was there but it might not be. This is a very basic guide from pyxem.

@ericpre correct me if I'm wrong but you should be able to:

cd rosetasciio/
pip install -e '.[doc]'
cd doc/
make SPHINXOPTS=-W --keep-going html

That will use the make file in the doc folder and then build the documentation in the build directory. You should be able to view the built webpages locally by opening any of those files.

The -W is for failing on warnings and the --keep-going means fail but continue so that you can have multiple failures.

Sometimes it can get complicated and the errors can get a little esoteric, so feel free to ask questions if you need help!

CSSFrancis avatar May 27 '25 12:05 CSSFrancis

I'll also note that I'm an avid user of the git squash command (for my own work). I use that a bunch if I happen to make a mistake, push it and then have to make another commit to fix something. It helps clean up your git history.

I think Linus is a little too strict in most cases (I like people to make PR's way before they are merged) but I always reference this as a good starting point for understanding what to do (and not to do) with your git history.

https://www.mail-archive.com/[email protected]/msg39091.html

CSSFrancis avatar May 27 '25 17:05 CSSFrancis

@smribet: In the interest of finishing this PR, I went ahead by committing @CSSFrancis's suggestion, tweaking the documentation, adding the changelog entry and merging.

Congratulation on your first PR to rosettasciio! :)

ericpre avatar May 28 '25 17:05 ericpre

Very exciting! Thanks for all of your help along the way!

smribet avatar May 29 '25 01:05 smribet