podaacpy
podaacpy copied to clipboard
Add functionality to push data products to cloud storage
Some functions for the associated services have a path=''
meaning that the user can download the data to wherever they want on the local machine.
This issue looks to allow s3 paths such that the data can be sent to s3 for analysis.
Hey ! can I take up the issue?
Hi @swatisingh45 yes please. The idea would be to add a new parameter to both def granule_subset(self, input_file_path, path='') and extract_l4_granule(self, dataset_id='', path='') to essentially include a boolean flag to persistence in s3. The new function signatures would then look something like
extract_l4_granule(self, dataset_id='', store='local', path='')
...
granule_subset(self, input_file_path, store='local', path='')
By default the storage device would be 'local' disk however the possible options would be both 'local' and 's3'.
When using s3 we should introduce a config.properties
file which essentially contains key values representing the AWS configuration e.g. username and password. This file could be read when the user create an instance of Podaac()
.
Regarding the code for uploading files to s3, you can base it on the following example
import boto
import boto.s3
import sys
from boto.s3.key import Key
AWS_ACCESS_KEY_ID = ''
AWS_SECRET_ACCESS_KEY = ''
bucket_name = AWS_ACCESS_KEY_ID.lower() + '-dump'
conn = boto.connect_s3(AWS_ACCESS_KEY_ID,
AWS_SECRET_ACCESS_KEY)
bucket = conn.create_bucket(bucket_name,
location=boto.s3.connection.Location.DEFAULT)
testfile = "replace this with an actual filename"
print 'Uploading %s to Amazon S3 bucket %s' % \
(testfile, bucket_name)
def percent_cb(complete, total):
sys.stdout.write('.')
sys.stdout.flush()
k = Key(bucket)
k.key = 'my test file'
k.set_contents_from_filename(testfile,
cb=percent_cb, num_cb=10)
Thank you for taking this issue on, if you have any issues then please let me know.
@swatisingh45 are you working on this? If not then I will do it, thank you.
Using Apache LibCloud's Python Object Storage API might be a good idea here.