rosettasciio Add .seq format for DE 16 and Celeritas Camera

Description of the change

This adds in support for reading the DE 16 and Celeritas cameras.

Some notes about the file format: DE 16:

The DE 16 camera reads out to multiple files. A metadata file, dark, gain and a .seq file. These files all have the same naming scheme so I read all files in the same folder with the same naming scheme or allow for directly passing the files.
The data is in a binary format with each frame at some offset and a time stamp following the frame.

*Celeritas

Due to the speed at which this camera reads out data the camera is split in two a "top" and a "bottom" frame are both read concurrently.
These frames are also saved in a buffer. With multiple images saved in a big long image.
- This makes memory mapping this dataset a little bit harder as there isn't a constant stream of data, I would like to add support for using the distributed scheduler but that might have to wait.
- This buffer is saved in the XML file alongside the data. There may be a way to guess this buffer if given the XML file and the FPS of the camera.
- The time stamp is only recorded once every buffer.
etc.

Progress of the PR

[x] Added De 16 support for loading ~~- [ ] Add DE 16 support for saving (Potentially?)~~
[x] Added Celeritas support
[x] Add support for DE 16 using the distributed scheduler
[x] Add support for Celeritas using the distributed scheduler
[x] update docstring (if appropriate),
[x] update user guide (if appropriate),
[x] add an changelog entry in the upcoming_changes folder (see upcoming_changes/README.rst),
[x] Check formatting changelog entry in the readthedocs doc build of this PR (link in github checks)
[x] add tests for basic loading
[x] ready for review.

Minimal example of the bug fix or the new feature

from rsciio.de import api
api.file_reader("test.seq") # read regular .seq

api.file_reader("test_Top_.seq", celeritas=True) # read celeritas .seq

Aug 15 '22 14:08 CSSFrancis

Codecov Report

Patch coverage: 90.16% and project coverage change: +0.20 :tada:

Comparison is base (b045157) 84.95% compared to head (a340725) 85.15%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #11      +/-   ##
==========================================
+ Coverage   84.95%   85.15%   +0.20%     
==========================================
  Files          73       75       +2     
  Lines        8894     9250     +356     
  Branches     1955     2022      +67     
==========================================
+ Hits         7556     7877     +321     
- Misses        876      895      +19     
- Partials      462      478      +16

Impacted Files	Coverage Δ
rsciio/de/_api.py	`89.77% <89.77%> (ø)`
rsciio/utils/tools.py	`80.26% <90.90%> (+6.93%)`	:arrow_up:
rsciio/de/__init__.py	`100.00% <100.00%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

Aug 15 '22 14:08 codecov[bot]

@sk1p I know we have talked about adding support for the DE Celeritas camera to liberTEM and hyperspy. If you have the chance can you look over this PR? The hardest thing is dealing with the Segment prebuffer for the celeritas camera.

I wanted to add support for distributed scheduling using the scheme proposed by @uellue here but due to the nature of the prebuffer the data isn't evenly spaced in the binary file. This makes implementing this in a general way fairly difficult.

Aug 15 '22 14:08 CSSFrancis

I know we have talked about adding support for the DE Celeritas camera to liberTEM and hyperspy. If you have the chance can you look over this PR? The hardest thing is dealing with the Segment prebuffer for the celeritas camera.

I can have a look - I'd also like to try this with real data, did you manage to upload some to the drop link I gave you some time ago?

In general, what is this project's stance on testing with real input data? It could be possible to publish a set of (small-ish) reference data sets on i.e. zenodo and download those in CI runs.

I wanted to add support for distributed scheduling using the scheme proposed by @uellue here but due to the nature of the prebuffer the data isn't evenly spaced in the binary file. This makes implementing this in a general way fairly difficult.

Yeah - in case of uneven spacing, it's probably required to do a sparse search pass over the data, for example by reading the image headers at N positions in the whole data set, and mapping out where it can be split - if I understood you correctly. Or is the coarse structure evenly spaced, i.e. it's possible to calculate offsets to images just from their index?

Anyways, instead of just a straight mmap, there would need to be a function that decodes whatever is in the file to a numpy array. That's also something needed for quite many other formats, i.e. FRMS6, binary MIB, ...

Aug 15 '22 17:08 sk1p

I can have a look - I'd also like to try this with real data, did you manage to upload some to the drop link I gave you some time ago?

Right now the data is all hosted in the tests/de_data/celeritas_data folder. There are smallish (1-20 mb) datasets collected using a couple of different camera modes. These are probably the best data used for testing.

In general, what is this project's stance on testing with real input data? It could be possible to publish a set of (small-ish) reference data sets on i.e. zenodo and download those in CI runs.

We try to test with real input data as often as we can. That being said the data is included with the package and it might be better to host that somewhere else eventually. I was meaning to create an Issue regarding this.

Yeah - in case of uneven spacing, it's probably required to do a sparse search pass over the data, for example by reading the image headers at N positions in the whole data set, and mapping out where it can be split - if I understood you correctly. Or is the coarse structure evenly spaced, i.e. it's possible to calculate offsets to images just from their index?

So the data is structured like this: Seq Scheme So its not quite uneven, but the images are saved in chunks. You can calculate the image offset if you know the number of images in a buffer.

Anyways, instead of just a straight mmap, there would need to be a function that decodes whatever is in the file to a numpy array. That's also something needed for quite many other formats, i.e. FRMS6, binary MIB, ...

Any examples of how you do this? Can you just create function that maps a frame to a offset in the data and then just apply it?

Aug 15 '22 18:08 CSSFrancis

Right now the data is all hosted in the tests/de_data/celeritas_data folder. There are smallish (1-20 mb) datasets collected using a couple of different camera modes. These are probably the best data used for testing.

The longterm idea is to host the files in the repo, but to exclude them from the installation, where they would just be downloaded on demand. I don't remember the name of the package that can do this @ericpre . But would be a good idea to create an issue to put it on the todo.

Aug 16 '22 11:08 jlaehne

Note that we are changing the API slightly to expose the file_reader directly without referencing .api for each plugin, additionally we provide some standard strings for the docstrings and docstrings will be loaded into the user guide.

The documentation is adapted in #62

Would be great if you could adapt the PR sometime along the way (if anything is unclear, please leave feedback for #62).

Nov 16 '22 08:11 jlaehne

rosettasciio rosettasciio copied to clipboard

Add .seq format for DE 16 and Celeritas Camera

Description of the change

Progress of the PR

Minimal example of the bug fix or the new feature

Codecov Report

rosettasciio
rosettasciio copied to clipboard