silx
silx copied to clipboard
Add `Tiled` data sources to browse Bluesky runs
There has been a recent burst of interest and activity in integrating bluesky/tiled
data sources into silx
(and by extension into pyMCA). I'm summarizing some of those discussions here to get feedback from silx
developers.
Background
Several light sources are looking into using pyMCA as a browser of data collected during Bluesky runs. From discussions with @linupi @vasole @t20100 it was suggested that modifying silx
to accept a Tiled data catalog would be an elegant way to do this for pyMCA, silx-view, and any other apps depending on silx.
@t20100 has started a proof-of-concept branch that shows a pathway for adapting a Tiled Container to a HDF5-like interface.
Preliminary scope (to be refined)
Discussion on 2024-03-26 @whs92 @danielballan @abbygi @vshekar @padraic-shafer [...missing handles for more BESSY-II participants]
During a chat between several developers at NSLS-II and BESSY-II, we recognized a common interest in using pyMCA as a "bluesky-supported" visual explorer of Tiled datasets for beamline experimenters. We identified several preliminary goals for a development sprint.
- Connect to a tiled server over HTTP -- Accept a URL; handle Auth
- Browse contents, with ability to filter and sort
- Should identify bluesky runs
- Will likely need a per-endstation configuration of metadata "projections" (flattened subset of important metadata)
- View baseline data for selected run(s)
- Plot scan data using existing plot tools
- Use hinted data by default
- User can assign "any" channel to a plot axis
- "Live plot" of data being captured
- More than one bluesky run may be active at once (nested scans)
- Initially target a poling loop ~1 second
- Leave a path open to tiled-stream / websocket
- Must be able to resume viewing a scan-in-progress if client restarts
Refined goals
Discussion on 2024-04-09 @whs92 @danielballan @abbygi @vshekar @padraic-shafer
- Use an isolated "Open" dialog or similar entrypoint that can cope with paginated access to large Catalogs, generating a smaller dataset (a tiled client with filters applied) that can be passed down to the rest of the silx/PyMca stack. This can also hand down authentication state.
- Fit Tiled nodes into HDF5 abstraction up to some limit (~1000). Tabular data from Tiled is just a Group of 1-dimensional arrays.
- Focus on 'primary' and 'baseline' streams to start, with an eye on "tab per stream" and whether that fits.
- Have a switch for polling live data. (This can later be refactored to use websockets, once Tiled supports that.)
- Ensure HTTP I/O does not lock up or crash the app.
Refined goals
Discussion on 2024-04-09 @whs92 @danielballan @AbbyGi @vshekar @padraic-shafer
- Use an isolated "Open" dialog or similar entrypoint that can cope with paginated access to large Catalogs, generating a smaller dataset (a tiled client with filters applied) that can be passed down to the rest of the silx/PyMca stack. This can also hand down authentication state.
- Fit Tiled nodes into HDF5 abstraction up to some limit (~1000). Tabular data from Tiled is just a Group of 1-dimensional arrays.
- Focus on 'primary' and 'baseline' streams to start, with an eye on "tab per stream" and whether that fits.
- Have a switch for polling live data. (This can later be refactored to use websockets, once Tiled supports that.)
- Ensure HTTP I/O does not lock up or crash the app.
@vasole @t20100 @linupi Because we weren't able to find a suitable time yet for all of us to meet live--and it sounds like it might be a couple weeks until that's possible--what do you think about this approach? Do you foresee particular difficulties or incompatibilities in fitting this into the architecture of silx?
Hi,
Thanks for the summary!
For the silx part, it makes sense to me and the proof-of-concept was very simple to implement. However, I still have a shallow understanding of tiled.
For now my main concern would be point 5 "Ensure HTTP I/O does not lock up or crash the app." since the hdf5-like API and silx view
are built around synchronous access to the data, and I'm not convinced this is easy to change.
BTW, you might want to have a look at h5web, a web-based HDF5 data viewer my colleagues @axelboc and @loichuder developed and maintain. It is available as a JupyterLab extension, a VSCode extension and powers HDF5 online viewing of the ESRF "data portal" and the https://myhdf5.hdfgroup.org/ online viewer (thanks to h5wasm).
This again aims at supporting HDF5 files but the access to the data is abstracted through Providers (for now there's 3 for the HDFGroup's HSDS server, h5wasm and our h5grove a small server tailored for h5web), so there may be a way to adapt it to tiled.
As opposed to silx view
, it's natively asynchronous.
Thanks @t20100. I agree that the blocking I/O sounds like the hard part. We may have to live with a synchronous I/O for now and just make sure that timeouts return control to the user in the event of connection issues.
Adding an h5web Provider for Tiled is also interesting. This has been on our radar since we opened an Issue in Tiled in September 2021. It might be about time to do it. One perhaps unique capability this could add is the ability to view specfiles, TIFFs, and other formats, which Tiled can serve through a unified HDF5-ish abstraction.
I think PyMca is serving a particular cluster of requirements though, so we would pursue this in addition to PyMca integration.
We may have to live with a synchronous I/O for now and just make sure that timeouts return control to the user in the event of connection issues.
Sounds good to me.
I just made some update to the silx branch with basic tiled support, and opened PR #4121.
Compared to the previous poc version:
- The
tiled:
prefix no longer works (changed totiled-
to avoid URL parsing issues): This is not compatible with current support in pymca (https://github.com/vasole/pymca/pull/1074) - But, this prefix is no longer needed and should be removed IMO
- There is a way to limit the number of retrieved entries per container.
Feedbacks welcomed!
Just to comment that if the prefix is removed, it would simplify things at the PyMca side too because I had already foreseen to handle URLs exclusively via the silx abstraction.
tiled-
prefix removed.
Also reworked the TiledDataset
to inherit directly from commonh5.Dataset
and added a tile Cache.