earthaccess icon indicating copy to clipboard operation
earthaccess copied to clipboard

earthaccess should start using CMR metadata to obtain S3 credentials

Open betolink opened this issue 2 years ago • 4 comments

earthaccess uses a hard-coded list of DAAC endpoints to get temporary S3 credentials, however more and more datasets are under different endpoints and/or a different category( e.g. as services), Having earthaccess using the DirectDistributionInformation/S3CredentialsAPIEndpoint from the UMM metadata record would be a more robust solution. This could bring a time penalty so caching would also be nice to have.

betolink avatar Aug 15 '23 20:08 betolink

I was going to make a discussion item about this, but I see this has already been opened.

I was trying to access this ASF product directly on S3: s3://asf-cumulus-prod-opera-products/OPERA_L2_CSLC-S1_T048-101101-IW3_20231102T232838Z_20231209T045008Z_S1A_VV_v1.0/OPERA_L2_CSLC-S1_T048-101101-IW3_20231102T232838Z_20231209T045008Z_S1A_VV_v1.0.h5 but using earthaccess.get_s3_credentials('ASF') uses the credential URL for Sentinel-1 ASF data instead of the correct one https://cumulus.asf.alaska.edu/s3credentials .

But while poking around, I see that you already added the ability to get the right credentials if you pass in the result:

In [9]: results = earthaccess.search_data(granule_name='OPERA_L2_CSLC-S1_T012-024520-IW1_20231007T124805Z_20231009T193746Z_S1A_VV_v1.0', concept_id="C2777443834-ASF")
Granules found: 1
In [10]: earthaccess.get_s3_credentials(results=results)
# this works!

I just wanted to check- is this the recommended way to get the login? Or is there some other way planned where you can request a different credentials URL before searching for results?

scottstanie avatar Jan 16 '24 22:01 scottstanie

@scottstanie thanks! Yes, ASF has multiple endpoints now and the best place to determine which endpoint to use is the CMR metadata.

@betolink do you think we could eliminate the hard coding entirely?

jhkennedy avatar Jan 16 '24 22:01 jhkennedy

Thank you! Also I realized right after I posted that, I can also use the search_datasets results to get credentials without picking a random product:

In [43]: rs = earthaccess.search_datasets(daac='ASF', keyword="OPERA_L2_CSLC-S1_V1")
Datasets found: 1

In [44]: rs[0].s3_bucket()
Out[44]:
{'Region': 'us-west-2',
 'S3BucketAndObjectPrefixNames': ['asf-cumulus-prod-opera-products/OPERA_L2_CSLC-S1/',
  'asf-cumulus-prod-opera-browse/OPERA_L2_CSLC-S1/'],
 'S3CredentialsAPIEndpoint': 'https://cumulus.asf.alaska.edu/s3credentials',
 'S3CredentialsAPIDocumentationURL': 'https://cumulus.asf.alaska.edu/s3credentialsREADME'}

but I noticed that I have to do

auth = earthaccess.login()
auth.get_s3_credentials(endpoint=...)

since the top level earthaccess.get_s3_credentials doesn't have the endpoint argument. Still seems easy enough for me though 👍

scottstanie avatar Jan 16 '24 22:01 scottstanie

I think this was resolved in v0.8.2

I tested this and it worked as expected without the need to specify the endpoint (it will use the metadata from CMR):

import earthaccess
import xarray as xr

earthaccess.login()

results = earthaccess.search_data(granule_name='OPERA_L2_CSLC-S1_T012-024520-IW1_20231007T124805Z_20231009T193746Z_S1A_VV_v1.0', concept_id="C2777443834-ASF")
file = earthaccess.open(results)[0]

ds = xr.open_dataset(file, group="data")
ds

if we want to download the file it would be the same:

import earthaccess

earthaccess.login()

results = earthaccess.search_data(granule_name='OPERA_L2_CSLC-S1_T012-024520-IW1_20231007T124805Z_20231009T193746Z_S1A_VV_v1.0', concept_id="C2777443834-ASF")
files = earthaccess.download(results, "opera/")

betolink avatar Jan 16 '24 23:01 betolink