bioformats2raw icon indicating copy to clipboard operation
bioformats2raw copied to clipboard

Support for Google Cloud Storage

Open perlman opened this issue 3 years ago • 9 comments

This is a small experiment for writing output directly to GCS.

This is done by including google-cloud-nio. outputOptions needs to be non-null for newFileSystem().

While functional, this is still incomplete:

  • [ ] values in outputOptions need to be coerced into Integer/Boolean/etc to work with java-storage-nio.
  • [X] some mechanism for specifying alternate credentials (link to google docs for credentials)
  • [ ] Add test using com.google.cloud.storage.contrib.nio.testing

(As a side note, we tried using Google's S3 interface. It fails on a permissions check in JZarr before writing data.)

perlman avatar Nov 20 '22 23:11 perlman

@perlman: did you have more work planned here?

melissalinkert avatar Mar 31 '23 21:03 melissalinkert

I've been using this unmodified for a while now. I'll bring it up-to-date with main and see where we're at.

perlman avatar Apr 26 '23 13:04 perlman

Thanks for the update, @perlman. I'm fine with taking this out of draft status, but adding a usage example to the README would be helpful for testing.

melissalinkert avatar Apr 27 '23 17:04 melissalinkert

@perlman, is there a simple example of how to use this feature?

melissalinkert avatar Jun 14 '23 20:06 melissalinkert

Whoops, I let this slip. I'll get to this today or tomorrow! (or Monday, sorry about that.)

perlman avatar Jun 14 '23 22:06 perlman

@melissalinkert I'm wondering where the right place to put an example. I had started to modify the --help text, but it seems that it may be a bit too verbose?

The usage is very straight forward, e.g.:

bioformats2raw-0.7.1-SNAPSHOT/bin/bioformats2raw --tile_width 2048 A_2202_20_ApoB.ndpi gs://jax-zarr-playpen/data/A_2202_20_ApoB.zarr

That's it. The access credentials will come from the environment, e.g, gcloud auth login or inherited from a service account. (Application Default Credentials )

The credentials must allow for read/write on the bucket. (Minimally, this can be Storage Object Creator, Storage Object Viewer and Storage Object Delete).

--output-options does not currently work. Google NIO does not seem happy with the Map<string, string>, with an exception related to the type. I've punted temporarily on digging into this, as it would probably require some special case type conversion of the values.

perlman avatar Jun 22 '23 23:06 perlman

Sorry for dropping this - really was just thinking a few lines in the README.md with exactly what you've already noted in https://github.com/glencoesoftware/bioformats2raw/pull/176#issuecomment-1603427528 is sufficient documentation.

melissalinkert avatar Oct 19 '23 15:10 melissalinkert

@perlman: that's great, thanks. Do you want to take this out of draft so we can consider for 0.8.0? Or did you have more work planned before this is ready for review?

melissalinkert avatar Oct 23 '23 19:10 melissalinkert

I think this meets MVP! I've been using it to convert a bunch of NDPI files to Zarr.

At minimum, I think we should add an example of using s3 to the README (& the suggested flags used for Cloudian deployments).

"Nice to have" would be working flags for GCS (which require correct value types) and a test using com.google.cloud.storage.contrib.nio.testing, which would show functional NIO2 integration.

perlman avatar Oct 24 '23 17:10 perlman