Enable Temporary Credentials for AWS S3 Access
When the HDF5 ROS3 driver is invoked, only the region, access key ID, and secret key are currently passed. However, to access NASA Earthdata S3 buckets, a session token is also required. While the HDF5 ROS3 driver does support a session token, it is not currently being utilized.
It would be beneficial to:
-
Add support for passing a session token to the ROS3 driver.
-
Parse all potential sources of credentials (e.g., environment variables and AWS config files) to determine the most appropriate set of credentials for accessing multiple buckets.
-
Add support for S3 URIs such as: s3://ob-cumulus-prod-public/PACE_OCI.20240411T000331.L1B.V3.nc
I’ve created a development branch to address this issue: 👉 https://github.com/alexandervladsemenov/netcdf-c/tree/temp_credentials
The code has been tested and is functional. However, I believe the current mechanism for parsing URLs/URIs in netCDF could benefit from a broader redesign. I'd like to get input from the netCDF development team before proceeding further.
I’ll submit a draft pull request to facilitate feedback and discussion.
Why did you not start from the existing URI/S3-URI code in netcdf-c library?
Why did you not start from the existing URI/S3-URI code in netcdf-c library?
Primarily because we need to read from one S3 bucket (s3://ob-cumulus-prod-public/PACE_OCI.20240411T000331.L1B.V3.nc) and write to a different bucket, each potentially requiring a different set of credentials.
Since S3 URIs do not contain region information, and because each bucket may be accessible only with a specific credential set, the only reliable way to determine valid credentials for a given URI is to perform an HTTP request. I use curl with CURLOPT_AWS_SIGV4 to check whether a particular bucket can be accessed using a given set of AWS credentials.
If similar functionality already exists in the netcdf-c codebase, I’d greatly appreciate it if you could point me to the relevant parts of the source code.
Edit: I'm happy to look into modifying the existing source code (libdispatch/ds3util.c) if that's what you're referring to.
I am now working on this. But I really need a requirements for this proposal.
- What does a token look like?
- What is the lifetime of a token?
- Where is it reasonable for a token to be in these places?
- For example, would a "token" field in a .aws/credentials file make sense.
- I gather you want the token to appear in a URL. How do you think it should look? Would it be ok if it is is in the URI fragment?
Additional issue: The example appliance URL you give ````s3://ob-cumulus-prod-public/PACE_OCI.20240411T000331.L1B.V3.nc``` is difficult to parse. Which part of the URL specifies the bucket?
I am now working on this. But I really need a requirements for this proposal.
What does a token look like?
What is the lifetime of a token?
Where is it reasonable for a token to be in these places?
- For example, would a "token" field in a .aws/credentials file make sense.
- I gather you want the token to appear in a URL. How do you think it should look? Would it be ok if it is is in the URI fragment?
- A set of credentials look like this:
{"accessKeyId": "ASIATCIYRXSX3TWXU7LB", "secretAccessKey": "f6cALqAR7TDmM8SLAUctlQIfK8Rp+z1q1Q78X8ZP", "sessionToken": "FwoGZXIvYXdzEJ7//////////wEaDNoW2Bbk2pvMlgKv5CLjAU5p3GQSfDtphDVcpL/0puRNHakcUcXfeg7VyFYpeyALy9ankWwWUtGvs2xgXLkbb1JW5Uc1v2Uc7907rha+4g0BLYPRacfgW9KBRcqgdZHMcTpNK/eYhNJMrfvun102zMNzQQHtjcULdtzC6EUfjsd3Bq+3DmKdlhnhrZIX4z0Lw9WFuTZ23CQPQkOp5jSuzh+BkCGz62TOKpoxdxpAeUOubnNAnrx1kiEn+diqugjZQRdjnvf7b2+/OrSMEVbu+ZGTM7y3nsmIblnybrAkMg3bxoyfsSEYw4kBT5fB/Gel8avbKMva3sQGMi2criaHBeOdjQImoIqK/MhFG6WIbuUdrHg3UKhhMwKkd9WmjM/U+O+FCIQrj3M=", "expiration": "2025-08-09 21:19:23+00:00"}
For the NASA S3 bucket "obdaac," credentials can be obtained from here: https://obdaac-tea.earthdatacloud.nasa.gov/s3credentials
The HDF5 library sets the session token as follows::
status = H5Pset_fapl_ros3_token(fapl_id, sessionToken)
- The lifetime is 1 hour.
- We want to be able to access multiple buckets.
First, the code should look up environment variables as defined here
export AWS_ACCESS_KEY_ID=ASIATCIYRXSX3TWXU7LB
export AWS_SECRET_ACCESS_KEY=f6cALqAR7TDmM8SLAUctlQIfK8Rp+z1q1Q78X8ZP
export AWS_DEFAULT_REGION=us-west-2
export AWS_SESSION_TOKEN=FwoGZXIvYXdzEJ7//////////wEaDNoW2Bbk2pvMlgKv5CLjAU5p3GQSfDtphDVcpL/0puRNHakcUcXfeg7VyFYpeyALy9ankWwWUtGvs2xgXLkbb1JW5Uc1v2Uc7907rha+4g0BLYPRacfgW9KBRcqgdZHMcTpNK/eYhNJMrfvun102zMNzQQHtjcULdtzC6EUfjsd3Bq+3DmKdlhnhrZIX4z0Lw9WFuTZ23CQPQkOp5jSuzh+BkCGz62TOKpoxdxpAeUOubnNAnrx1kiEn+diqugjZQRdjnvf7b2+/OrSMEVbu+ZGTM7y3nsmIblnybrAkMg3bxoyfsSEYw4kBT5fB/Gel8avbKMva3sQGMi2criaHBeOdjQImoIqK/MhFG6WIbuUdrHg3UKhhMwKkd9WmjM/U+O+FCIQrj3M=
It should also check credentials and config files as defined here
.aws/credentials
[default]
aws_access_key_id=ASIAIOSFODNN7EXAMPLE
aws_secret_access_key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
aws_session_token = IQoJb3JpZ2luX2IQoJb3JpZ2luX2IQoJb3JpZ2luX2IQoJb3JpZ2luX2IQoJb3JpZVERYLONGSTRINGEXAMPLE
[user1]
aws_access_key_id=ASIAI44QH8DHBEXAMPLE
aws_secret_access_key=je7MtGbClwBF/2Zp9Utk/h3yCo8nvbEXAMPLEKEY
aws_session_token = fcZib3JpZ2luX2IQoJb3JpZ2luX2IQoJb3JpZ2luX2IQoJb3JpZ2luX2IQoJb3JpZVERYLONGSTRINGEXAMPLE
.aws/config
[default]
region=us-west-2
output=json
[profile user1]
region=us-east-1
output=text
Additional issue: The example appliance URL you give ````s3://ob-cumulus-prod-public/PACE_OCI.20240411T000331.L1B.V3.nc``` is difficult to parse. Which part of the URL specifies the bucket?
ob-cumulus-prod-public is the bucket.
Thank you @alexandervladsemenov! That is very helpful, a great illustrative example.
As an aside, our system folks are on guard for anything that looks like a secret key being posted in a public forum; I'm sure it's fine, you posted it, but would you mind confirming that we aren't accidentally hosting a truly secret key/credentials in this discussion? Thanks a lot, and thanks for your help :).
Thank you @alexandervladsemenov! That is very helpful, a great illustrative example.
As an aside, our system folks are on guard for anything that looks like a secret key being posted in a public forum; I'm sure it's fine, you posted it, but would you mind confirming that we aren't accidentally hosting a truly secret key/credentials in this discussion? Thanks a lot, and thanks for your help :).
Hi @WardF , The credentials I posted were temporary and have already expired, so no harm is done. :)