vitessce-python icon indicating copy to clipboard operation
vitessce-python copied to clipboard

Use anywidget-based IPC for zarr gets which do not require serving data on localhost

Open keller-mark opened this issue 1 year ago • 1 comments
trafficstars

This change enables using the Vitessce widget in HPC situations, on Google Colab, in VSCode, and in HuBMAP Workspaces with data that is local to the Python kernel. Uses the new API from https://github.com/manzt/anywidget/pull/453

In these environments it can be difficult / impossible to proxy the requests from the browser in which the notebook is running down to the server on which the python kernel is running. Jupyter-server-proxy can only get us so far. Similarly, this may also fix #255 because this is another environment that presents challenges (e.g., it is not a web browser so we cannot rely on the structure of the notebook URL to help us construct the data URLs).

TODO:

  • [ ] Document this on https://vitessce.github.io/vitessce-python/data_options.html
  • [ ] Register store in all Zarr-based Wrapper classes once decide how store will be passed/instantiated

One question is how to expose this in the API. Maybe we require the user to pass the store? Like

dataset = vc.add_dataset(name='Brain').add_object(AnnDataWrapper(
-       adata_path=zarr_filepath,
+       adata_store=zarr.DirectoryStore(zarr_filepath),
        obs_embedding_paths=["obsm/X_tsne"],
        obs_embedding_names=["UMAP"],
        obs_set_paths=["obs/CellType"],
        obs_set_names=["Cell Type"],
        obs_feature_matrix_path="X",
        initial_feature_filter_path="var/top_highly_variable"
    )
)

This would allow more than just DirectoryStores but would require more work from the user.

Or maybe we keep adata_path and add something like as_store (should it be True by default?). This would not allow any other store types but maybe that is ok?

dataset = vc.add_dataset(name='Brain').add_object(AnnDataWrapper(
        adata_path=zarr_filepath,
+       as_store=False,
        obs_embedding_paths=["obsm/X_tsne"],
        obs_embedding_names=["UMAP"],
        obs_set_paths=["obs/CellType"],
        obs_set_names=["Cell Type"],
        obs_feature_matrix_path="X",
        initial_feature_filter_path="var/top_highly_variable"
    )
)

cc @manzt

keller-mark avatar Apr 01 '24 20:04 keller-mark

Maybe default to something like zarr.storage.FSStore, which is based on fsspec and automatically infers stores?

manzt avatar Apr 01 '24 21:04 manzt