webknossos icon indicating copy to clipboard operation
webknossos copied to clipboard

Recursive exploration of remote datasets

Open MichaelBuessemeyer opened this issue 7 months ago • 0 comments

This PR adds recursive exploration to the already existing remote dataset exploration. This is supported for the local file system, GCS and S3.

URL of deployed dev instance (used for testing):

  • https://___.webknossos.xyz

Steps to test:

  • Test GCS via gs://neuroglancer-fafb-data/fafb_v14/. This should result in a successfully explored dataset.
  • Test S3 via s3://janelia-cosem-datasets/jrc_mus-nacc-4/jrc_mus-nacc-4.zarr/. This should result in a successfully explored dataset.
  • Test locally:
    • Create a new local folders e.g. <wk-root>/binaryData2/some_dir/more_dir, <wk-root>/binaryData2/other_dir/more_dir

    • Add <wk-root>/binaryData2/ to the whilelist in the application.conf in line 197.

    • Enter file:///binaryData2/` into the add remote dataset form. The request should fail and only include a short error message, not leaking any information about the underlying folder structure of the server.

    • Add a new dataset (not wkw, as wkw exploration is not implemented) e.g. l4_sample_zarr3_sharded to <wk-root>/binaryData2/other_dir/more_dir

    • Enter file:///binaryData2/` into the add remote dataset form. The request should successfully find the dataset.

TODOs:

  • [ ] Currently, the backend leaks the directory structure of the whitelisted directories allowed by the whitelisting feature in case the exploration fails. This should not be exposed. Moreover, @normanrz argued that the information is not useful to users. -> I'd add a warning to the docs about the whitelisting feature to only include datasets in the subdirectories and not any kind of sensitive information like ssh key and such. Moreover, the debug log should not be exposed to the end users. At least in case a local file system is used. In case wk crawls remote cloud storages, the person using wk already has the necessary credentials (if necessary) to list the cloud storage. In that case wk does not leak any information the user would already have.
  • [ ] Should the mutable report be included in the client answer even for non local datasets?

Issues:


(Please delete unneeded items, merge only when none are left open)

  • [x] Updated changelog
  • [x] Updated documentation if applicable
  • [ ] Removed dev-only changes like prints and application.conf edits
  • [ ] Considered common edge cases
  • [x] Needs datastore update after deployment

MichaelBuessemeyer avatar Jul 02 '24 15:07 MichaelBuessemeyer