earthkit-data Implement Amazon S3 bucket source

Implement Amazon S3 bucket source

Open sandorkertesz opened this issue 1 year ago • 1 comments

Still work in progress.

The new source for an S3 bucket can be used like this:

import earthkit.data

# endpoint="s3.amazonaws.com"
bucket_name = "ecmwf-forecasts"
key = "20240111/00z/0p4-beta/oper/20240111000000-0h-oper-fc.grib2"

r = {"bucket": bucket_name, 
     "objects": [
         {"object":  key} 
          ],
   }

ds = earthkit.data.from_source("s3", r, stream=False, anon=True)
ds.ls()

More examples are available at: https://earthkit-data.readthedocs.io/en/feature-s3/examples/s3.html

Multiple buckets and objects can be used
A single part can be specified for an object as

"objects": [
         {"object": key, "start": 0, "range": 438714} 
          ],

The default endpoint is s3.amazonaws.com. Other endpoints can be specified in the request as:

r = {"bucket": bucket_name, 
      "endpoint": "my_endpoint",
     "objects": [
     ....

The stream option controls if the data is read as a stream or downloaded to a file. The default is stream=True
The anon option controls whether if it is an anonymous access or AWS credentials should be used. The default is anon=True. Handling the credentials requires the aws-requests-auth and botocore packages

Oct 12 '23 16:10 sandorkertesz

earthkit-data earthkit-data copied to clipboard

Implement Amazon S3 bucket source

earthkit-data
earthkit-data copied to clipboard