rosettasciio
rosettasciio copied to clipboard
expose some functions under bruker.api
Description of the change
This PR aims to modernise and extend the bruker._api.py:
- expose middle layer functions and classes under
bruker.api - review, update and extend the docstrings, in particular of these functions and class'es which or which child are
going to be exposed through
.api - review the code for possible cleanup, scope-separation and streamlining for public exposition in
.api
Progress of the PR
- [x] expose
SfsReader - [ ] expose
xml_to_spectrum - [ ] expose
xml_to_image - [ ] update docstring (if appropriate),
- [ ] update user guide (if appropriate),
- [ ] add an changelog entry in the
upcoming_changesfolder (seeupcoming_changes/README.rst), - [ ] Check formatting changelog entry in the
readthedocsdoc build of this PR (link in github checks) - [ ] add tests,
- [ ] ready for review.
???
are xml_to_spectrum and xml_to_image good naming for these functions. functions require particular etree nodes - maybe et_node_to_spectrum and et_node_to_image would be better naming?
Minimal example of the bug fix or the new feature
from sciio.bruker import api as b_api
b_api.SFSReader('somefile.pan') # *.pan are particle analysis files using same container as bcf
Codecov Report
Patch coverage: 96.62% and project coverage change: +0.13 :tada:
Comparison is base (
5f9e746) 85.16% compared to head (a57fd72) 85.29%.
Additional details and impacted files
@@ Coverage Diff @@
## main #121 +/- ##
==========================================
+ Coverage 85.16% 85.29% +0.13%
==========================================
Files 73 74 +1
Lines 9030 9042 +12
Branches 1932 2045 +113
==========================================
+ Hits 7690 7712 +22
+ Misses 873 870 -3
+ Partials 467 460 -7
| Impacted Files | Coverage Δ | |
|---|---|---|
| rsciio/bruker/_api.py | 88.74% <96.59%> (+0.95%) |
:arrow_up: |
| rsciio/bruker/api.py | 100.00% <100.00%> (ø) |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
@ericpre , @jlaehne I have some question. As I am doing pretty extensive review of bruker _api.py, so that it could be more streamlined, I came again over idea how to handle images. Bruker xml image node can have plains (that is called "plains" within the XML), it is basically different video channels, which register signal at same time/simultaniously when beam is rastered over sample ROI (In some sense it is very similar to spectroscopy to be honest). So basically most of metadata is the same (actually in XML it is physically shared) except description string of the every channel. Hitherto in this case all metadata was being duplicated, and every plain was returned as separate independent set with its data and metadata. But... is it the right and most efficient way? would it not be better way to stack those plains, alongside axis called as "channels"? can axes have labeled scale (that is instead of numbers it would be channel description)? Does this makes any sense at all? I mean for me it does a lot, as those plains are different representations of exactly same material interaction or result of beam: i.e. say as example 4 channels: BF, DF, DF4, HAADF. They all represent different interacted electrons with matter. Even if taking 2 channel (most common setup with SEM having Bruker EDS) of BSE and SE images, those represents different energy electrons, and in some sense combining it forms kind of spectral image.
Such change probably is going break some established workflows. But I think it would be very right thing to do.
We had a similar topic brought up for LumiSpy (CL spectral image + corresponding SE image) by @jordiferrero, which was never implemented so far: https://github.com/LumiSpy/lumispy/issues/73 Indeed, it would be nice to have an upstream solution for that in HyperSpy!
Concerning the 'labeled' axes for stacked data, @CSSFrancis was working on a PR.
So adding labeled axes wouldn't be terribly hard in hyperspy. I have done most of the work in https://github.com/hyperspy/hyperspy/pull/3031 Right now there is a little bit of hesitation with this change but I think that there isn't really anything stopping this from happening. This might be something worth voting on or having a larger discussion about how these signals are handled.
Handling multiple signals is a little bit tricky and becomes more tricky if they have different numbers of pixels, but even that can be handled by hyperspy.
There are a couple of things:
- If all of the signals have the same dimensionality and same size --> pass as labeled axes and create a stacked array
- If all of the signals have the same dimensionality and different --> pass as labeled axes and create a stacked ragged array
- If one of the signals has a higher dimensionality (i.e. HAADF and 4-D STEM) pass one as navigator and one as array
For 3 I also think that in the case where you have multiple different signals you should be able to pass all of them. In that case you could have multiple different navigators. For example with a 4-D STEM you could have brightfield and darkfield images.
thanks @CSSFrancis , Your 3 point division makes it pretty clear. So "planes" used in bruker XML image actually not only fulfills point 1. of same dimentiality and resolution/size, but are even more closely related, as it have exactly same column conditions, and is pixel-to-pixel generated with exactly same beam simultaneously - thus even more - it should be stacked. Hopefully Your PR https://github.com/hyperspy/hyperspy/pull/3031 will get accepted. I probably will stack images with unitless axes as for now.
Hyperspy is not even my main/only aim with this streamlining attempt. I want this to stay same useful for Hyperspy 2.0 (actually be more useful for extensions, with EBSD, XRF... in mind, which I plan to address in some following PR's), but also to pave easier road for my own software (HussariX).
Hyperspy is not even my main/only aim with this streamlining attempt. I want this to stay same useful for Hyperspy 2.0 (actually be more useful for extensions, with EBSD, XRF... in mind, which I plan to address in some following PR's), but also to pave easier road for my own software (HussariX).
In these types of situations "what is good for the goose is good for the gander" or both software packages will probably benefit from a consistent approach to these low level problems :) It also helps with interoperability which can only be a good thing.
To be honest the case #2 above is quite difficult from the perspective of hyperspy. There is nothing that says hyperspy cannot have a signal of signals similar to how numpy can have a ragged array of arrays. I considered something like that here but I am not sure if there is a consistent way to approach that. For now there hasn't really been a real need for it so I haven't really paid it much attention.
Just for me to make sure that I don't misunderstand, @sem-geologist what you are asking is to have a signal, which contains BSE and SE images stacked together and another one which contains the EBSD or EDS data? Could it be easily stack after loading the data by the library itself, at a stage, where it is known how the data needs to be structure/handle. I am not sure that this is the responsibility of RosettaSciIO to define this?
Currently, in hyperspy, these are considered as treated as different dataset and they expected to be handle at the workflow level (script, notebook, library etc.). Maybe what RosettaSciIO could do is to assign some dataset to a specify category: "main", "auxiliary" (data acquired simultaneously but not "main", or already processed data, i.e. map from a spectrum image), "survey", etc.