silx icon indicating copy to clipboard operation
silx copied to clipboard

Add `Tiled` data sources to browse Bluesky runs

Open padraic-shafer opened this issue 10 months ago • 8 comments

There has been a recent burst of interest and activity in integrating bluesky/tiled data sources into silx (and by extension into pyMCA). I'm summarizing some of those discussions here to get feedback from silx developers.

Background

Several light sources are looking into using pyMCA as a browser of data collected during Bluesky runs. From discussions with @linupi @vasole @t20100 it was suggested that modifying silx to accept a Tiled data catalog would be an elegant way to do this for pyMCA, silx-view, and any other apps depending on silx.

@t20100 has started a proof-of-concept branch that shows a pathway for adapting a Tiled Container to a HDF5-like interface.


Preliminary scope (to be refined)

Discussion on 2024-03-26 @whs92 @danielballan @abbygi @vshekar @padraic-shafer [...missing handles for more BESSY-II participants]

During a chat between several developers at NSLS-II and BESSY-II, we recognized a common interest in using pyMCA as a "bluesky-supported" visual explorer of Tiled datasets for beamline experimenters. We identified several preliminary goals for a development sprint.

  1. Connect to a tiled server over HTTP -- Accept a URL; handle Auth
  2. Browse contents, with ability to filter and sort
    • Should identify bluesky runs
    • Will likely need a per-endstation configuration of metadata "projections" (flattened subset of important metadata)
  3. View baseline data for selected run(s)
  4. Plot scan data using existing plot tools
    • Use hinted data by default
    • User can assign "any" channel to a plot axis
  5. "Live plot" of data being captured
    • More than one bluesky run may be active at once (nested scans)
    • Initially target a poling loop ~1 second
    • Leave a path open to tiled-stream / websocket
    • Must be able to resume viewing a scan-in-progress if client restarts

Refined goals

Discussion on 2024-04-09 @whs92 @danielballan @abbygi @vshekar @padraic-shafer

  1. Use an isolated "Open" dialog or similar entrypoint that can cope with paginated access to large Catalogs, generating a smaller dataset (a tiled client with filters applied) that can be passed down to the rest of the silx/PyMca stack. This can also hand down authentication state.
  2. Fit Tiled nodes into HDF5 abstraction up to some limit (~1000). Tabular data from Tiled is just a Group of 1-dimensional arrays.
  3. Focus on 'primary' and 'baseline' streams to start, with an eye on "tab per stream" and whether that fits.
  4. Have a switch for polling live data. (This can later be refactored to use websockets, once Tiled supports that.)
  5. Ensure HTTP I/O does not lock up or crash the app.

padraic-shafer avatar Apr 10 '24 13:04 padraic-shafer

Refined goals

Discussion on 2024-04-09 @whs92 @danielballan @AbbyGi @vshekar @padraic-shafer

  1. Use an isolated "Open" dialog or similar entrypoint that can cope with paginated access to large Catalogs, generating a smaller dataset (a tiled client with filters applied) that can be passed down to the rest of the silx/PyMca stack. This can also hand down authentication state.
  2. Fit Tiled nodes into HDF5 abstraction up to some limit (~1000). Tabular data from Tiled is just a Group of 1-dimensional arrays.
  3. Focus on 'primary' and 'baseline' streams to start, with an eye on "tab per stream" and whether that fits.
  4. Have a switch for polling live data. (This can later be refactored to use websockets, once Tiled supports that.)
  5. Ensure HTTP I/O does not lock up or crash the app.

@vasole @t20100 @linupi Because we weren't able to find a suitable time yet for all of us to meet live--and it sounds like it might be a couple weeks until that's possible--what do you think about this approach? Do you foresee particular difficulties or incompatibilities in fitting this into the architecture of silx?

padraic-shafer avatar Apr 10 '24 13:04 padraic-shafer

Hi,

Thanks for the summary!

For the silx part, it makes sense to me and the proof-of-concept was very simple to implement. However, I still have a shallow understanding of tiled. For now my main concern would be point 5 "Ensure HTTP I/O does not lock up or crash the app." since the hdf5-like API and silx view are built around synchronous access to the data, and I'm not convinced this is easy to change.

t20100 avatar Apr 11 '24 07:04 t20100

BTW, you might want to have a look at h5web, a web-based HDF5 data viewer my colleagues @axelboc and @loichuder developed and maintain. It is available as a JupyterLab extension, a VSCode extension and powers HDF5 online viewing of the ESRF "data portal" and the https://myhdf5.hdfgroup.org/ online viewer (thanks to h5wasm). This again aims at supporting HDF5 files but the access to the data is abstracted through Providers (for now there's 3 for the HDFGroup's HSDS server, h5wasm and our h5grove a small server tailored for h5web), so there may be a way to adapt it to tiled. As opposed to silx view, it's natively asynchronous.

t20100 avatar Apr 11 '24 07:04 t20100

Thanks @t20100. I agree that the blocking I/O sounds like the hard part. We may have to live with a synchronous I/O for now and just make sure that timeouts return control to the user in the event of connection issues.

Adding an h5web Provider for Tiled is also interesting. This has been on our radar since we opened an Issue in Tiled in September 2021. It might be about time to do it. One perhaps unique capability this could add is the ability to view specfiles, TIFFs, and other formats, which Tiled can serve through a unified HDF5-ish abstraction.

I think PyMca is serving a particular cluster of requirements though, so we would pursue this in addition to PyMca integration.

danielballan avatar Apr 11 '24 10:04 danielballan

We may have to live with a synchronous I/O for now and just make sure that timeouts return control to the user in the event of connection issues.

Sounds good to me.

t20100 avatar Apr 12 '24 07:04 t20100

I just made some update to the silx branch with basic tiled support, and opened PR #4121.

Compared to the previous poc version:

  • The tiled: prefix no longer works (changed to tiled- to avoid URL parsing issues): This is not compatible with current support in pymca (https://github.com/vasole/pymca/pull/1074)
  • But, this prefix is no longer needed and should be removed IMO
  • There is a way to limit the number of retrieved entries per container.

Feedbacks welcomed!

t20100 avatar May 03 '24 14:05 t20100

Just to comment that if the prefix is removed, it would simplify things at the PyMca side too because I had already foreseen to handle URLs exclusively via the silx abstraction.

vasole avatar May 04 '24 08:05 vasole

tiled- prefix removed. Also reworked the TiledDataset to inherit directly from commonh5.Dataset and added a tile Cache.

t20100 avatar May 13 '24 15:05 t20100