netcdf-c icon indicating copy to clipboard operation
netcdf-c copied to clipboard

Enable Temporary Credentials for AWS S3 Access

Open alexandervladsemenov opened this issue 6 months ago • 2 comments

When the HDF5 ROS3 driver is invoked, only the region, access key ID, and secret key are currently passed. However, to access NASA Earthdata S3 buckets, a session token is also required. While the HDF5 ROS3 driver does support a session token, it is not currently being utilized.

It would be beneficial to:

  • Add support for passing a session token to the ROS3 driver.

  • Parse all potential sources of credentials (e.g., environment variables and AWS config files) to determine the most appropriate set of credentials for accessing multiple buckets.

  • Add support for S3 URIs such as: s3://ob-cumulus-prod-public/PACE_OCI.20240411T000331.L1B.V3.nc

I’ve created a development branch to address this issue: 👉 https://github.com/alexandervladsemenov/netcdf-c/tree/temp_credentials

The code has been tested and is functional. However, I believe the current mechanism for parsing URLs/URIs in netCDF could benefit from a broader redesign. I'd like to get input from the netCDF development team before proceeding further.

I’ll submit a draft pull request to facilitate feedback and discussion.

alexandervladsemenov avatar Jun 09 '25 01:06 alexandervladsemenov

Why did you not start from the existing URI/S3-URI code in netcdf-c library?

DennisHeimbigner avatar Jun 09 '25 01:06 DennisHeimbigner

Why did you not start from the existing URI/S3-URI code in netcdf-c library?

Primarily because we need to read from one S3 bucket (s3://ob-cumulus-prod-public/PACE_OCI.20240411T000331.L1B.V3.nc) and write to a different bucket, each potentially requiring a different set of credentials.

Since S3 URIs do not contain region information, and because each bucket may be accessible only with a specific credential set, the only reliable way to determine valid credentials for a given URI is to perform an HTTP request. I use curl with CURLOPT_AWS_SIGV4 to check whether a particular bucket can be accessed using a given set of AWS credentials.

If similar functionality already exists in the netcdf-c codebase, I’d greatly appreciate it if you could point me to the relevant parts of the source code.

Edit: I'm happy to look into modifying the existing source code (libdispatch/ds3util.c) if that's what you're referring to.

alexandervladsemenov avatar Jun 09 '25 01:06 alexandervladsemenov

I am now working on this. But I really need a requirements for this proposal.

  1. What does a token look like?
  2. What is the lifetime of a token?
  3. Where is it reasonable for a token to be in these places?
    • For example, would a "token" field in a .aws/credentials file make sense.
    • I gather you want the token to appear in a URL. How do you think it should look? Would it be ok if it is is in the URI fragment?

DennisHeimbigner avatar Jul 17 '25 19:07 DennisHeimbigner

Additional issue: The example appliance URL you give ````s3://ob-cumulus-prod-public/PACE_OCI.20240411T000331.L1B.V3.nc``` is difficult to parse. Which part of the URL specifies the bucket?

DennisHeimbigner avatar Jul 18 '25 01:07 DennisHeimbigner

I am now working on this. But I really need a requirements for this proposal.

  1. What does a token look like?

  2. What is the lifetime of a token?

  3. Where is it reasonable for a token to be in these places?

    • For example, would a "token" field in a .aws/credentials file make sense.
    • I gather you want the token to appear in a URL. How do you think it should look? Would it be ok if it is is in the URI fragment?
  1. A set of credentials look like this:
{"accessKeyId": "ASIATCIYRXSX3TWXU7LB", "secretAccessKey": "f6cALqAR7TDmM8SLAUctlQIfK8Rp+z1q1Q78X8ZP", "sessionToken": "FwoGZXIvYXdzEJ7//////////wEaDNoW2Bbk2pvMlgKv5CLjAU5p3GQSfDtphDVcpL/0puRNHakcUcXfeg7VyFYpeyALy9ankWwWUtGvs2xgXLkbb1JW5Uc1v2Uc7907rha+4g0BLYPRacfgW9KBRcqgdZHMcTpNK/eYhNJMrfvun102zMNzQQHtjcULdtzC6EUfjsd3Bq+3DmKdlhnhrZIX4z0Lw9WFuTZ23CQPQkOp5jSuzh+BkCGz62TOKpoxdxpAeUOubnNAnrx1kiEn+diqugjZQRdjnvf7b2+/OrSMEVbu+ZGTM7y3nsmIblnybrAkMg3bxoyfsSEYw4kBT5fB/Gel8avbKMva3sQGMi2criaHBeOdjQImoIqK/MhFG6WIbuUdrHg3UKhhMwKkd9WmjM/U+O+FCIQrj3M=", "expiration": "2025-08-09 21:19:23+00:00"}

For the NASA S3 bucket "obdaac," credentials can be obtained from here: https://obdaac-tea.earthdatacloud.nasa.gov/s3credentials

The HDF5 library sets the session token as follows::

status = H5Pset_fapl_ros3_token(fapl_id, sessionToken)
  1. The lifetime is 1 hour.
  2. We want to be able to access multiple buckets.

First, the code should look up environment variables as defined here

export AWS_ACCESS_KEY_ID=ASIATCIYRXSX3TWXU7LB
export AWS_SECRET_ACCESS_KEY=f6cALqAR7TDmM8SLAUctlQIfK8Rp+z1q1Q78X8ZP
export AWS_DEFAULT_REGION=us-west-2
export AWS_SESSION_TOKEN=FwoGZXIvYXdzEJ7//////////wEaDNoW2Bbk2pvMlgKv5CLjAU5p3GQSfDtphDVcpL/0puRNHakcUcXfeg7VyFYpeyALy9ankWwWUtGvs2xgXLkbb1JW5Uc1v2Uc7907rha+4g0BLYPRacfgW9KBRcqgdZHMcTpNK/eYhNJMrfvun102zMNzQQHtjcULdtzC6EUfjsd3Bq+3DmKdlhnhrZIX4z0Lw9WFuTZ23CQPQkOp5jSuzh+BkCGz62TOKpoxdxpAeUOubnNAnrx1kiEn+diqugjZQRdjnvf7b2+/OrSMEVbu+ZGTM7y3nsmIblnybrAkMg3bxoyfsSEYw4kBT5fB/Gel8avbKMva3sQGMi2criaHBeOdjQImoIqK/MhFG6WIbuUdrHg3UKhhMwKkd9WmjM/U+O+FCIQrj3M=

It should also check credentials and config files as defined here

.aws/credentials

[default]
aws_access_key_id=ASIAIOSFODNN7EXAMPLE
aws_secret_access_key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
aws_session_token = IQoJb3JpZ2luX2IQoJb3JpZ2luX2IQoJb3JpZ2luX2IQoJb3JpZ2luX2IQoJb3JpZVERYLONGSTRINGEXAMPLE

[user1]
aws_access_key_id=ASIAI44QH8DHBEXAMPLE
aws_secret_access_key=je7MtGbClwBF/2Zp9Utk/h3yCo8nvbEXAMPLEKEY
aws_session_token = fcZib3JpZ2luX2IQoJb3JpZ2luX2IQoJb3JpZ2luX2IQoJb3JpZ2luX2IQoJb3JpZVERYLONGSTRINGEXAMPLE

.aws/config

[default]
region=us-west-2
output=json

[profile user1]
region=us-east-1
output=text

Additional issue: The example appliance URL you give ````s3://ob-cumulus-prod-public/PACE_OCI.20240411T000331.L1B.V3.nc``` is difficult to parse. Which part of the URL specifies the bucket?

ob-cumulus-prod-public is the bucket.

alexandervladsemenov avatar Aug 09 '25 20:08 alexandervladsemenov

Thank you @alexandervladsemenov! That is very helpful, a great illustrative example.

As an aside, our system folks are on guard for anything that looks like a secret key being posted in a public forum; I'm sure it's fine, you posted it, but would you mind confirming that we aren't accidentally hosting a truly secret key/credentials in this discussion? Thanks a lot, and thanks for your help :).

WardF avatar Aug 11 '25 18:08 WardF

Thank you @alexandervladsemenov! That is very helpful, a great illustrative example.

As an aside, our system folks are on guard for anything that looks like a secret key being posted in a public forum; I'm sure it's fine, you posted it, but would you mind confirming that we aren't accidentally hosting a truly secret key/credentials in this discussion? Thanks a lot, and thanks for your help :).

Hi @WardF , The credentials I posted were temporary and have already expired, so no harm is done. :)

alexandervladsemenov avatar Aug 12 '25 01:08 alexandervladsemenov