rosettasciio
rosettasciio copied to clipboard
Add .seq format for DE 16 and Celeritas Camera
Description of the change
This adds in support for reading the DE 16 and Celeritas cameras.
Some notes about the file format: DE 16:
- The DE 16 camera reads out to multiple files. A metadata file, dark, gain and a .seq file. These files all have the same naming scheme so I read all files in the same folder with the same naming scheme or allow for directly passing the files.
- The data is in a binary format with each frame at some offset and a time stamp following the frame.
*Celeritas
- Due to the speed at which this camera reads out data the camera is split in two a "top" and a "bottom" frame are both read concurrently.
- These frames are also saved in a buffer. With multiple images saved in a big long image.
- This makes memory mapping this dataset a little bit harder as there isn't a constant stream of data, I would like to add support for using the distributed scheduler but that might have to wait.
- This buffer is saved in the XML file alongside the data. There may be a way to guess this buffer if given the XML file and the FPS of the camera.
- The time stamp is only recorded once every buffer.
- etc.
Progress of the PR
- [x] Added De 16 support for loading ~~- [ ] Add DE 16 support for saving (Potentially?)~~
- [x] Added Celeritas support
- [x] Add support for DE 16 using the distributed scheduler
- [x] Add support for Celeritas using the distributed scheduler
- [x] update docstring (if appropriate),
- [x] update user guide (if appropriate),
- [x] add an changelog entry in the
upcoming_changesfolder (seeupcoming_changes/README.rst), - [x] Check formatting changelog entry in the
readthedocsdoc build of this PR (link in github checks) - [x] add tests for basic loading
- [x] ready for review.
Minimal example of the bug fix or the new feature
from rsciio.de import api
api.file_reader("test.seq") # read regular .seq
api.file_reader("test_Top_.seq", celeritas=True) # read celeritas .seq
Codecov Report
Patch coverage: 90.16% and project coverage change: +0.20 :tada:
Comparison is base (
b045157) 84.95% compared to head (a340725) 85.15%.
Additional details and impacted files
@@ Coverage Diff @@
## main #11 +/- ##
==========================================
+ Coverage 84.95% 85.15% +0.20%
==========================================
Files 73 75 +2
Lines 8894 9250 +356
Branches 1955 2022 +67
==========================================
+ Hits 7556 7877 +321
- Misses 876 895 +19
- Partials 462 478 +16
| Impacted Files | Coverage Δ | |
|---|---|---|
| rsciio/de/_api.py | 89.77% <89.77%> (ø) |
|
| rsciio/utils/tools.py | 80.26% <90.90%> (+6.93%) |
:arrow_up: |
| rsciio/de/__init__.py | 100.00% <100.00%> (ø) |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
@sk1p I know we have talked about adding support for the DE Celeritas camera to liberTEM and hyperspy. If you have the chance can you look over this PR? The hardest thing is dealing with the Segment prebuffer for the celeritas camera.
I wanted to add support for distributed scheduling using the scheme proposed by @uellue here but due to the nature of the prebuffer the data isn't evenly spaced in the binary file. This makes implementing this in a general way fairly difficult.
I know we have talked about adding support for the DE Celeritas camera to liberTEM and hyperspy. If you have the chance can you look over this PR? The hardest thing is dealing with the Segment prebuffer for the celeritas camera.
I can have a look - I'd also like to try this with real data, did you manage to upload some to the drop link I gave you some time ago?
In general, what is this project's stance on testing with real input data? It could be possible to publish a set of (small-ish) reference data sets on i.e. zenodo and download those in CI runs.
I wanted to add support for distributed scheduling using the scheme proposed by @uellue here but due to the nature of the prebuffer the data isn't evenly spaced in the binary file. This makes implementing this in a general way fairly difficult.
Yeah - in case of uneven spacing, it's probably required to do a sparse search pass over the data, for example by reading the image headers at N positions in the whole data set, and mapping out where it can be split - if I understood you correctly. Or is the coarse structure evenly spaced, i.e. it's possible to calculate offsets to images just from their index?
Anyways, instead of just a straight mmap, there would need to be a function that decodes whatever is in the file to a numpy array. That's also something needed for quite many other formats, i.e. FRMS6, binary MIB, ...
I can have a look - I'd also like to try this with real data, did you manage to upload some to the drop link I gave you some time ago?
Right now the data is all hosted in the tests/de_data/celeritas_data folder. There are smallish (1-20 mb) datasets collected using a couple of different camera modes. These are probably the best data used for testing.
In general, what is this project's stance on testing with real input data? It could be possible to publish a set of (small-ish) reference data sets on i.e. zenodo and download those in CI runs.
We try to test with real input data as often as we can. That being said the data is included with the package and it might be better to host that somewhere else eventually. I was meaning to create an Issue regarding this.
Yeah - in case of uneven spacing, it's probably required to do a sparse search pass over the data, for example by reading the image headers at N positions in the whole data set, and mapping out where it can be split - if I understood you correctly. Or is the coarse structure evenly spaced, i.e. it's possible to calculate offsets to images just from their index?
So the data is structured like this:
So its not quite uneven, but the images are saved in chunks. You can calculate the image offset if you know the number of images in a buffer.
Anyways, instead of just a straight
mmap, there would need to be a function that decodes whatever is in the file to a numpy array. That's also something needed for quite many other formats, i.e. FRMS6, binary MIB, ...
Any examples of how you do this? Can you just create function that maps a frame to a offset in the data and then just apply it?
Right now the data is all hosted in the tests/de_data/celeritas_data folder. There are smallish (1-20 mb) datasets collected using a couple of different camera modes. These are probably the best data used for testing.
The longterm idea is to host the files in the repo, but to exclude them from the installation, where they would just be downloaded on demand. I don't remember the name of the package that can do this @ericpre . But would be a good idea to create an issue to put it on the todo.
Note that we are changing the API slightly to expose the file_reader directly without referencing .api for each plugin, additionally we provide some standard strings for the docstrings and docstrings will be loaded into the user guide.
The documentation is adapted in #62
Would be great if you could adapt the PR sometime along the way (if anything is unclear, please leave feedback for #62).