earthaccess icon indicating copy to clipboard operation
earthaccess copied to clipboard

Add new S3 credentials endpoint for Giovanni zarr store

Open asteiker opened this issue 2 years ago • 4 comments

A new s3 credentials endpoint for the GES DISC's Giovanni Zarr store is now available: https://api.giovanni.earthdata.nasa.gov/s3credentials

We should make sure this is discoverable through earthaccess. Initially, the associated data collection, GPM_3IMERGHH v6 will have a new RelatedURL that points to documentation on how to access the store. So, we won't have any direct programmatic discovery means of going from the collection CMR record to the zarr store S3 URI until further work is done to extend CMR to support this. But it is great progress in the right direction for end-to-end zarr support.

asteiker avatar Mar 23 '23 22:03 asteiker

Currently we only have S3 credentials endpoints on a per DAAC basis, which covers the typical use cases(getting data from a DAAC) but with Giovanni, Harmony and other NASA-wide services maybe it's time to break that login into getting credentials for data and services, something like:

import earthaccess

earthaccess.login()

services = earthaccess.search_services("SOME CRITERIA TO FIND GIOVANNI")
giovanni = services[0]
credentials = giovani.get_s3_credentials()

In the meantime we can add Giovani but it would look like it's another DAAC, or we can wait to implement the services discovery methods and have something like the code above, what do you think @asteiker?

betolink avatar Apr 20 '23 22:04 betolink

@betolink Great thoughts. What is a bit unique about this case is that Giovanni's zarr store just happens to be the location of the zarr store for this GPM_3IMERGHH collection. So the use case is really collection-based discovery/access, not service-based per se.

For now, a Related_URL was added to the collection metadata pointing to a "Product Usage" link: C1598621093-GES_DISC. . But the Search & Discovery train will be analyzing how to incorporate this better into collection and variable-based discovery. So I'm not sure if/how to solve for this right now with this workaround, or hold off until zarr is better supported within CMR.

asteiker avatar Apr 24 '23 14:04 asteiker

Interesting, in theory if the collection belongs to GES_DISC, the S3 credentials for GES_DISC should work. I assume this is not the case because this Giovanni Zarr store is not using the Cumulus machinery to get the data ingested?

I think that after we implement services and variable discovery in earthaccess, we could use the response from CMR (if it contains specific S3 credentials) to override the DAAC-level credentials for the access part. I'd say let's hold off for now until these use cases have a more programmatic access pattern, what do you think?

betolink avatar Apr 24 '23 16:04 betolink

I think that after we implement services and variable discovery in earthaccess, we could use the response from CMR (if it contains specific S3 credentials) to override the DAAC-level credentials for the access part.

Just checking in here. @betolink did https://github.com/nsidc/earthaccess/pull/296 fix this issue?

jrbourbeau avatar Oct 20 '23 20:10 jrbourbeau