xraylarch
xraylarch copied to clipboard
Support for SOLEIL NeXus file format
The file format used in the two beamlines I've been to at SOLEIL is NeXus, which is HDF5 under the hood. The files can, thus, be opened using larch.io.h5group on the command line, but XAS Viewer still refuses to open it with the following traceback:
[larch.io.specfile_reader.DataSourceSpecH5] ERROR : 'measurement' not found -> use 'set_scan' method first
[larch.io.specfile_reader.DataSourceSpecH5] ERROR : object of type 'NoneType' has no len()
[larch.io.specfile_reader.DataSourceSpecH5] ERROR : 'measurement' not found -> use 'set_scan' method first
[larch.io.specfile_reader.DataSourceSpecH5] ERROR : object of type 'NoneType' has no len()
[larch.io.specfile_reader.DataSourceSpecH5] ERROR : 'measurement' not found -> use 'set_scan' method first
[larch.io.specfile_reader.DataSourceSpecH5] ERROR : object of type 'NoneType' has no len()
[larch.io.specfile_reader.DataSourceSpecH5] ERROR : 'measurement' not found -> use 'set_scan' method first
[larch.io.specfile_reader.DataSourceSpecH5] ERROR : 'instrument/positioners' not found -> use 'set_scan' method first
[larch.io.specfile_reader.DataSourceSpecH5] ERROR : 'measurement' not found -> use 'set_scan' method first
Traceback (most recent call last):
File "/home/kaarel/Develop/xraylarch/larch/wxxas/xasgui.py", line 1152, in onReadDialog
self.onRead(path)
File "/home/kaarel/Develop/xraylarch/larch/wxxas/xasgui.py", line 1167, in onRead
self.show_subframe('spec_import', SpecfileImporter,
File "/home/kaarel/Develop/xraylarch/larch/wxxas/xasgui.py", line 1105, in show_subframe
self.subframes[name] = frameclass(self, **opts)
File "/home/kaarel/Develop/xraylarch/larch/wxlib/specfile_importer.py", line 341, in __init__
self.curscan = self.specfile.get_scan(curscan)
File "/home/kaarel/Develop/xraylarch/larch/io/specfile_reader.py", line 703, in get_scan
motor_names = self.get_scan_motors()
File "/home/kaarel/Develop/xraylarch/larch/io/specfile_reader.py", line 514, in get_scan_motors
return [i for i in counters if i in all_motors]
TypeError: 'NoneType' object is not iterable
The .nxs file in question is from the PUMA beamline at SOLEIL, and is provided as an attachment.
Aside from allowing to load the SOLEIL .nxs files in XAS viewer, it would be nice to provide a more convenient access function on the command line interface. For example, to get fluorescence data out of the attached file, you would need to go rather deep into the hierarchy of the created Group:
from larch import Interpreter
from larch.io import h5group
_larch = Interpreter()
h5 = h5group("ALARA401_JV07-map1-spot1-0063.nxs", _larch)
h5.exp.scan_data.data_01
Since most of the other innumerable streams of data in these .nxs files seem to be sample and instrumentation metadata, it might make sense for this convenience function to attach the contents of h5.exp.scan_data
to the root of the Group, as these are the data most relevant to XAFS analysis in practice. Perhaps the NXData
identifier can be used to push up the relevant data further in the hierarchy.
@kaarelmand Thanks - I think we would be willing to say that the files from SOLEIL should be supported, and not assume that all H5 files follow the conventions of the ESRF/Bliss/Spec beamlines.
Is "root.exp.scan_data" meant to be some universal description of scans? That doesn't seem very NeXuS-like to me ;). But, if this is the H5 schema that SOLEIL uses, then sure, let's use that.
@maurov Can we add a way to detect what conventions an H5 file uses before assuming it is Spec/Bliss H5 file? That way we might cover French and European conventions for H5 ;).
@kaarelmand sure we are willing to have Larch and xas_viewer be able to read seamlessly as much data formats as possible, so this should not be difficult to implement, we just need to know the structure of the HDF5 file you have sent.
I may be wrong, but to me, the HDF5 you have provided, is not following NeXuS directives at all. From NeXuS it takes only the file extension. Below what I see when I look at its structure:
The error you get is due to the fact that we use by default the spech5 API in the module larch.io.specfile_reader.DataSourceSpecH5
. This is the standard data scheme used at ESRF.
I think we can extend this module to SOLEIL scheme, but we need a clear description of how the data are structured in the HDF5 container.
Otherwise, if you want to contribute directly to larch, feel free to submit a pull request. The only constraint we ask is to use silx.io.open
to read the file instead of using directly h5py
.
Thanks for considering this!
I've attached five .nxs files from SOLEIL. Two are from the PUMA beamline: the same XAFS file attached above and an XRF mapping .nxs file; and three are from the LUCIA beamline: one for XRF mapping data, one for a normal XAFS run, and one for a flyscan XAFS -- i.e., where the actuators are not stopping for the measurements, but instead the fluorescence at each energy is integrated over some distance "on the fly"; this last one may be difficult for XAS Viewer to parse. Probably the XRF map data are not useful here (unless you want to support them in GSEMapViewer), but I included them just in case.
Based on this small sample, it seems like the root.exp.scan_data format for hosting data is standardized throughout SOLEIL. I don't think this is in conflict with NeXus directives. On the image above, exp has the class NXentry
and right below it is scan_data with the class NXdata
, just as is described in the NeXus design document. Of the various entires under scan_data, the first one has a primary
attribute, which suggests it is the controlling variable for any plotting (energy scale or monochromator position in this case), whereas other data channels have signal
attributes, suggesting these are the responding variables to be plotted. This, too, is described under the NeXus data storing rules, although it corresponds to the now-deprecated Version 1 schema for finding plottable data.
I can try making a PR, although I'm very new at this and it'll take me a bit of time.
@kaarelmand thanks for sending more examples data from SOLEIL XAFS beamlines. I apologize for my early comment about not following NeXus directives. I completely missed the NX* entries this morning, before a good coffee and while I was alignining the beamline in parallel ;)
I propose to use silx.io.nxdata
for reading the NeXus data into larch. I will have a look this week and give an update here. Is that fine for you?
@newville are you in hurry to release 0.9.67
or you could wait having this included in the next release? In my opinion, it would be great having a first support for SOLEIL XAFS data in the next release.
No worries at all; that course of action works great for me!
@woutdenolf I put you in the loop for this, as discussed this morning. It would be great to have a proof of concept generic NeXus reader in Larch, e.g. larch.io.nexus_reader
.
If you are willing to do so, you could start by implementing the methods in larch.io.specfile_reader.DataSourceSpecH5
. By doing so, it will be straightforward to have xas_viewer GUI be able reading NeXus files without too much changes.
@maurov @kaarelmand @woutdenolf I have a couple of thoughts here.
a) I am not at all opposed to a "generic HDF5/NeXuS" browser to select data. I'm not sure whether using silx.io
or nexpy.nexusformat
code or straight h5py would be the best approach, but this seems worth pursuing. I'm willing to work toward this. Help or suggestions would be very much appreciated. I might also ask the APS "nexus people" about such questions.
b) I am also in favor of being able to identify data from "common sources" (beamlines, data-acquisition systems) and making sensible (but optional) default choices for how to read data. So, collecting example data sets and figuring out schema for "SOLEIL Nexus" (and "Elettra", etc) would be very helpful.
c) I am also in favor of working toward a common schema or at least a set of translations or aliases for the various formats based on (or related to) HDF5. Like, I'm toying with the idea of switching my HDF5 format for XRF maps to use Zarr. Also, and perhaps coincidentally, I had a meeting this morning for planning an XAFS meeting (Q2XAFS) for next August in Australia that will include real discussions on data formats. One of the "problems" identified is definitely how to handle the XAFS data in the various required-and-not-loved HDF5 formats.
So: any volunteers or suggestions for who are the right people to be in that conversation?
@newville thanks for your comments. I propose, as first step, to let @woutdenolf work out a proof of concept code working for the SOLEIL NeXus files sent by @kaarelmand (let's consider only the XAFS data and take out the XRF one for the moment).
Personally, I am afraid of those never-ending discussions on data formats that never converge to anything usable in practice and in the meantime most of the users simply convert these fancy HDF5/NeXus into ASCII files. This said, I would be more than happy to take part (virtually) in the next Q2XAFS meeting. I propose to discuss this topic elsewhere and keep this issue for the specific case initially raised by @kaarelmand .
@maurov Thanks. I am fine with focussing on the issue as raised here (cannot read files from SOLEIL easily) and moving the larger-scale discussion elsewhere.
@kaarelmand
To give some news from my side, I am lost in the NeXus complexity and, as human, I am not able to understand how to get a simple (energy, I0, mu) array from the PUMA and LUCIA data. Please, could you post a simple example of code using h5py
that gets the those arrays out of two XAFS data from these SOLEIL beamlines?
Furthermore, could you send an example of data file with multiple XAFS scans? For the LUCIA ones, to me it is practically impossible to know how to move from /
down to the data, because the name of the first group is specific to the sample. I think the easiest for me would be to discuss directly with beamline people at SOLEIL. Please, could you send me their contacts via private email?
@woutdenolf any news from your side on this?
I will have a look at this mid October.
I'm implementing a generic XAS source for Nexus. I will support 1 scan == 1 XAS scan. I could support Fullfield XAS (1 scan == many XAS scans) but as larch analysis is scan per scan, I don't think it makes sense.
@kaarelmand Some of the files does not contain XAS data. For example xasflyscan seems to contain only XRF spectra. You will first not to convert that to XAS data (1D data, energy vs. mu, I0, I1, ...).
@woutdenolf Thanks very much for working on this! I agree that support for processing and analyzing full-field XAFS is something we don't really consider, but we should consider how to do that. But, if it is clear that there is a single (or even common) way to represent such data, I would not at all be opposed to "provisional support for reading it". That would at least allow display, slicing, converting/merging into 1-D XAS spectra.
We could easily add other XasDataSource
classes for fullfield but we're talking about thousands of spectra. Opening all of them in xas_viewer
is rather pointless imo. You would need an entirely different type of interface if you want to handle XAS imaging or tomography.
@woutdenolf thank you very much indeed for working on this. I agree with you that opening fullfield data in xas_viewer
does not make sense, but Larch is also a library and aims handling more X-ray spectroscopy data, beyond XAFS. For example, Matt uses Larch for XRF imaging and tomography on his beamline; I use larch for XES, RIXS or any peak-like containing data for peak-fitting. Larch is also used for X-ray refraction data used in many techniques like DAFS, ReflEXAFS and spectral ptychography. So, to my opinion, if we want to implement a "generic NeXus file reader in Larch" we should include the possibility to read such data from the beginning.
We will review #412 as soon we have time for this, but I would recommend first adding your reader with an example (a Larch script or a Jupyter notebook in pure python would be great!) how to read and plot XAFS spectra from the NeXus files provided by @kaarelmand. At this stage I would not change the usual way Larch and xas_viewer
read data. @newville what is your opinion on this?
@mretegan I think it would be nice to have your opinion on this too.
class XasScan(NamedTuple):
name: str
description: str
info: str
start_time: str
labels: List[str]
data: ArrayLike
Ok then XasScan.data
can have shape (nlabels, nenergy)
for a single spectrum scan and shape (nlabels, npoints, nenergy)
for a multi spectrum scan.
Btw, do you prefer
class XasScan(NamedTuple):
name: str
description: str
info: str
start_time: str
labels: List[str]
data: ArrayLike
or
class XasScan(NamedTuple):
name: str
description: str
info: str
start_time: str
data: Dict[str, ArrayLike]
or even
class XasScan(NamedTuple):
name: str
description: str
info: str
start_time: str
data: pandas.DataFrame
@woutdenolf how to represent the data in memory (let's call it the "data model") can be a never ending discussion. In Larch this is currently done with the Group
object and all functions work with it, so I think the base 1D data structure should stay like this.
What we are missing in my opinion is a "Group of Groups" common object in Larch that can be nested (tree-like data model). At the moment we store Groups either in lists or in dictionaries. I think it would be beneficial to enhance this aspect for the moment.
When more than one dataset or more complex structure is needed I think we should stick to it and expand it to
@woutdenolf @maurov Thanks, I'm falling a bit behind due to other beamline stuff.
Yes to "Groups" (basically an empty class to access with Thing.attribute
instead of Thing['attribute']
) for general containers of data.
But, if XasScan here is meant to be a predictable, static-like thing, a NamedTuple is a fine way to represent "The XAS data from a Nexus file". I would suggest that we don't really need Pandas. I sort of like "simple" for data structures. I would probably use
class XasScan(NamedTuple):
name: str
description: str
info: str
start_time: str
labels: List[str]
data: ArrayLike
as it seems like it maps to HDF5 a bit better, and also to how we are already reading data from some text files.