Max upload size is 5GB
We currently upload files directly to AWS. Incorporate multi-part uploads to upload directly, or add easier way to include download URLs in a manifest (manifest.add_url('https//..'))
from AWS docs http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingRESTAPImpUpload.html
The largest object that can be uploaded in a single PUT is 5 gigabytes.
https://aws.amazon.com/s3/faqs/ (from AWS)
manifest.add_url(url) workaround
something like
filename = "gnomad.exomes.r2.0.1.sites.vcf.gz"
url = "https://storage.googleapis.com/gnomad-public/release-170228/vcf/exomes/gnomad.exomes.r2.0.1.sites.vcf.gz"
manifest = solvebio.Manifest()
manifest.manifest['files'] = [{'name': filename, 'url': url}]
imp = solvebio.DatasetImport.create(dataset_id=d.id,manifest=manifest.manifest, auto_approve=True)
Maybe use the multi-part upload functions from https://github.com/requests/toolbelt
This is still the limit FYI.
Will get an error like this
Traceback (most recent call last):
File "/usr/local/bin/solvebio", line 11, in <module>
sys.exit(main())
File "/usr/local/lib/python2.7/site-packages/solvebio/cli/main.py", line 309, in main
return args.func(args)
File "/usr/local/lib/python2.7/site-packages/solvebio/cli/data.py", line 216, in upload
vault.full_path)
File "/usr/local/lib/python2.7/site-packages/solvebio/resource/object.py", line 226, in upload_file
headers=headers)
File "/usr/local/lib/python2.7/site-packages/requests/api.py", line 126, in put
return request('put', url, data=data, **kwargs)
File "/usr/local/lib/python2.7/site-packages/requests/api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 508, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 618, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python2.7/site-packages/requests/adapters.py", line 490, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', error(32, 'Broken pipe'))
Look into ways to intercept this with a useful error message.
The current workaround is to send manifests with URLs to the files.