earthaccess
earthaccess copied to clipboard
earthaccess should start using CMR metadata to obtain S3 credentials
earthaccess uses a hard-coded list of DAAC endpoints to get temporary S3 credentials, however more and more datasets are under different endpoints and/or a different category( e.g. as services), Having earthaccess using the DirectDistributionInformation/S3CredentialsAPIEndpoint from the UMM metadata record would be a more robust solution. This could bring a time penalty so caching would also be nice to have.
I was going to make a discussion item about this, but I see this has already been opened.
I was trying to access this ASF product directly on S3: s3://asf-cumulus-prod-opera-products/OPERA_L2_CSLC-S1_T048-101101-IW3_20231102T232838Z_20231209T045008Z_S1A_VV_v1.0/OPERA_L2_CSLC-S1_T048-101101-IW3_20231102T232838Z_20231209T045008Z_S1A_VV_v1.0.h5 but using earthaccess.get_s3_credentials('ASF') uses the credential URL for Sentinel-1 ASF data instead of the correct one https://cumulus.asf.alaska.edu/s3credentials .
But while poking around, I see that you already added the ability to get the right credentials if you pass in the result:
In [9]: results = earthaccess.search_data(granule_name='OPERA_L2_CSLC-S1_T012-024520-IW1_20231007T124805Z_20231009T193746Z_S1A_VV_v1.0', concept_id="C2777443834-ASF")
Granules found: 1
In [10]: earthaccess.get_s3_credentials(results=results)
# this works!
I just wanted to check- is this the recommended way to get the login? Or is there some other way planned where you can request a different credentials URL before searching for results?
@scottstanie thanks! Yes, ASF has multiple endpoints now and the best place to determine which endpoint to use is the CMR metadata.
@betolink do you think we could eliminate the hard coding entirely?
Thank you!
Also I realized right after I posted that, I can also use the search_datasets results to get credentials without picking a random product:
In [43]: rs = earthaccess.search_datasets(daac='ASF', keyword="OPERA_L2_CSLC-S1_V1")
Datasets found: 1
In [44]: rs[0].s3_bucket()
Out[44]:
{'Region': 'us-west-2',
'S3BucketAndObjectPrefixNames': ['asf-cumulus-prod-opera-products/OPERA_L2_CSLC-S1/',
'asf-cumulus-prod-opera-browse/OPERA_L2_CSLC-S1/'],
'S3CredentialsAPIEndpoint': 'https://cumulus.asf.alaska.edu/s3credentials',
'S3CredentialsAPIDocumentationURL': 'https://cumulus.asf.alaska.edu/s3credentialsREADME'}
but I noticed that I have to do
auth = earthaccess.login()
auth.get_s3_credentials(endpoint=...)
since the top level earthaccess.get_s3_credentials doesn't have the endpoint argument. Still seems easy enough for me though 👍
I think this was resolved in v0.8.2
I tested this and it worked as expected without the need to specify the endpoint (it will use the metadata from CMR):
import earthaccess
import xarray as xr
earthaccess.login()
results = earthaccess.search_data(granule_name='OPERA_L2_CSLC-S1_T012-024520-IW1_20231007T124805Z_20231009T193746Z_S1A_VV_v1.0', concept_id="C2777443834-ASF")
file = earthaccess.open(results)[0]
ds = xr.open_dataset(file, group="data")
ds
if we want to download the file it would be the same:
import earthaccess
earthaccess.login()
results = earthaccess.search_data(granule_name='OPERA_L2_CSLC-S1_T012-024520-IW1_20231007T124805Z_20231009T193746Z_S1A_VV_v1.0', concept_id="C2777443834-ASF")
files = earthaccess.download(results, "opera/")