bioformats2raw
bioformats2raw copied to clipboard
Support for Google Cloud Storage
This is a small experiment for writing output directly to GCS.
This is done by including google-cloud-nio. outputOptions needs to be non-null for newFileSystem().
While functional, this is still incomplete:
- [ ] values in
outputOptionsneed to be coerced into Integer/Boolean/etc to work with java-storage-nio. - [X] some mechanism for specifying alternate credentials (link to google docs for credentials)
- [ ] Add test using
com.google.cloud.storage.contrib.nio.testing
(As a side note, we tried using Google's S3 interface. It fails on a permissions check in JZarr before writing data.)
@perlman: did you have more work planned here?
I've been using this unmodified for a while now. I'll bring it up-to-date with main and see where we're at.
Thanks for the update, @perlman. I'm fine with taking this out of draft status, but adding a usage example to the README would be helpful for testing.
@perlman, is there a simple example of how to use this feature?
Whoops, I let this slip. I'll get to this today or tomorrow! (or Monday, sorry about that.)
@melissalinkert I'm wondering where the right place to put an example. I had started to modify the --help text, but it seems that it may be a bit too verbose?
The usage is very straight forward, e.g.:
bioformats2raw-0.7.1-SNAPSHOT/bin/bioformats2raw --tile_width 2048 A_2202_20_ApoB.ndpi gs://jax-zarr-playpen/data/A_2202_20_ApoB.zarr
That's it. The access credentials will come from the environment, e.g, gcloud auth login or inherited from a service account. (Application Default Credentials )
The credentials must allow for read/write on the bucket. (Minimally, this can be Storage Object Creator, Storage Object Viewer and Storage Object Delete).
--output-options does not currently work. Google NIO does not seem happy with the Map<string, string>, with an exception related to the type. I've punted temporarily on digging into this, as it would probably require some special case type conversion of the values.
Sorry for dropping this - really was just thinking a few lines in the README.md with exactly what you've already noted in https://github.com/glencoesoftware/bioformats2raw/pull/176#issuecomment-1603427528 is sufficient documentation.
@perlman: that's great, thanks. Do you want to take this out of draft so we can consider for 0.8.0? Or did you have more work planned before this is ready for review?
I think this meets MVP! I've been using it to convert a bunch of NDPI files to Zarr.
At minimum, I think we should add an example of using s3 to the README (& the suggested flags used for Cloudian deployments).
"Nice to have" would be working flags for GCS (which require correct value types) and a test using com.google.cloud.storage.contrib.nio.testing, which would show functional NIO2 integration.