rosettasciio expose some functions under bruker.api

Description of the change

This PR aims to modernise and extend the bruker._api.py:

expose middle layer functions and classes under bruker.api
review, update and extend the docstrings, in particular of these functions and class'es which or which child are going to be exposed through .api
review the code for possible cleanup, scope-separation and streamlining for public exposition in .api

Progress of the PR

[x] expose SfsReader
[ ] expose xml_to_spectrum
[ ] expose xml_to_image
[ ] update docstring (if appropriate),
[ ] update user guide (if appropriate),
[ ] add an changelog entry in the upcoming_changes folder (see upcoming_changes/README.rst),
[ ] Check formatting changelog entry in the readthedocs doc build of this PR (link in github checks)
[ ] add tests,
[ ] ready for review.

??? are xml_to_spectrum and xml_to_image good naming for these functions. functions require particular etree nodes - maybe et_node_to_spectrum and et_node_to_image would be better naming?

Minimal example of the bug fix or the new feature

from sciio.bruker import api as b_api

b_api.SFSReader('somefile.pan')  # *.pan are particle analysis files using same container as bcf

May 19 '23 14:05 sem-geologist

Codecov Report

Patch coverage: 96.62% and project coverage change: +0.13 :tada:

Comparison is base (5f9e746) 85.16% compared to head (a57fd72) 85.29%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #121      +/-   ##
==========================================
+ Coverage   85.16%   85.29%   +0.13%     
==========================================
  Files          73       74       +1     
  Lines        9030     9042      +12     
  Branches     1932     2045     +113     
==========================================
+ Hits         7690     7712      +22     
+ Misses        873      870       -3     
+ Partials      467      460       -7

Impacted Files	Coverage Δ
rsciio/bruker/_api.py	`88.74% <96.59%> (+0.95%)`	:arrow_up:
rsciio/bruker/api.py	`100.00% <100.00%> (ø)`

... and 5 files with indirect coverage changes

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

May 19 '23 14:05 codecov[bot]

@ericpre , @jlaehne I have some question. As I am doing pretty extensive review of bruker _api.py, so that it could be more streamlined, I came again over idea how to handle images. Bruker xml image node can have plains (that is called "plains" within the XML), it is basically different video channels, which register signal at same time/simultaniously when beam is rastered over sample ROI (In some sense it is very similar to spectroscopy to be honest). So basically most of metadata is the same (actually in XML it is physically shared) except description string of the every channel. Hitherto in this case all metadata was being duplicated, and every plain was returned as separate independent set with its data and metadata. But... is it the right and most efficient way? would it not be better way to stack those plains, alongside axis called as "channels"? can axes have labeled scale (that is instead of numbers it would be channel description)? Does this makes any sense at all? I mean for me it does a lot, as those plains are different representations of exactly same material interaction or result of beam: i.e. say as example 4 channels: BF, DF, DF4, HAADF. They all represent different interacted electrons with matter. Even if taking 2 channel (most common setup with SEM having Bruker EDS) of BSE and SE images, those represents different energy electrons, and in some sense combining it forms kind of spectral image.

Such change probably is going break some established workflows. But I think it would be very right thing to do.

May 29 '23 14:05 sem-geologist

We had a similar topic brought up for LumiSpy (CL spectral image + corresponding SE image) by @jordiferrero, which was never implemented so far: https://github.com/LumiSpy/lumispy/issues/73 Indeed, it would be nice to have an upstream solution for that in HyperSpy!

Concerning the 'labeled' axes for stacked data, @CSSFrancis was working on a PR.

May 29 '23 17:05 jlaehne

So adding labeled axes wouldn't be terribly hard in hyperspy. I have done most of the work in https://github.com/hyperspy/hyperspy/pull/3031 Right now there is a little bit of hesitation with this change but I think that there isn't really anything stopping this from happening. This might be something worth voting on or having a larger discussion about how these signals are handled.

Handling multiple signals is a little bit tricky and becomes more tricky if they have different numbers of pixels, but even that can be handled by hyperspy.

There are a couple of things:

If all of the signals have the same dimensionality and same size --> pass as labeled axes and create a stacked array
If all of the signals have the same dimensionality and different --> pass as labeled axes and create a stacked ragged array
If one of the signals has a higher dimensionality (i.e. HAADF and 4-D STEM) pass one as navigator and one as array

For 3 I also think that in the case where you have multiple different signals you should be able to pass all of them. In that case you could have multiple different navigators. For example with a 4-D STEM you could have brightfield and darkfield images.

May 29 '23 17:05 CSSFrancis

thanks @CSSFrancis , Your 3 point division makes it pretty clear. So "planes" used in bruker XML image actually not only fulfills point 1. of same dimentiality and resolution/size, but are even more closely related, as it have exactly same column conditions, and is pixel-to-pixel generated with exactly same beam simultaneously - thus even more - it should be stacked. Hopefully Your PR https://github.com/hyperspy/hyperspy/pull/3031 will get accepted. I probably will stack images with unitless axes as for now.

Hyperspy is not even my main/only aim with this streamlining attempt. I want this to stay same useful for Hyperspy 2.0 (actually be more useful for extensions, with EBSD, XRF... in mind, which I plan to address in some following PR's), but also to pave easier road for my own software (HussariX).

May 29 '23 21:05 sem-geologist

Hyperspy is not even my main/only aim with this streamlining attempt. I want this to stay same useful for Hyperspy 2.0 (actually be more useful for extensions, with EBSD, XRF... in mind, which I plan to address in some following PR's), but also to pave easier road for my own software (HussariX).

In these types of situations "what is good for the goose is good for the gander" or both software packages will probably benefit from a consistent approach to these low level problems :) It also helps with interoperability which can only be a good thing.

To be honest the case #2 above is quite difficult from the perspective of hyperspy. There is nothing that says hyperspy cannot have a signal of signals similar to how numpy can have a ragged array of arrays. I considered something like that here but I am not sure if there is a consistent way to approach that. For now there hasn't really been a real need for it so I haven't really paid it much attention.

May 29 '23 22:05 CSSFrancis

Just for me to make sure that I don't misunderstand, @sem-geologist what you are asking is to have a signal, which contains BSE and SE images stacked together and another one which contains the EBSD or EDS data? Could it be easily stack after loading the data by the library itself, at a stage, where it is known how the data needs to be structure/handle. I am not sure that this is the responsibility of RosettaSciIO to define this?

Currently, in hyperspy, these are considered as treated as different dataset and they expected to be handle at the workflow level (script, notebook, library etc.). Maybe what RosettaSciIO could do is to assign some dataset to a specify category: "main", "auxiliary" (data acquired simultaneously but not "main", or already processed data, i.e. map from a spectrum image), "survey", etc.

May 30 '23 09:05 ericpre

rosettasciio rosettasciio copied to clipboard

expose some functions under bruker.api

Description of the change

Progress of the PR

Minimal example of the bug fix or the new feature

Codecov Report

rosettasciio
rosettasciio copied to clipboard