solvebio-python icon indicating copy to clipboard operation
solvebio-python copied to clipboard

Max upload size is 5GB

Open jsh2134 opened this issue 8 years ago • 4 comments

We currently upload files directly to AWS. Incorporate multi-part uploads to upload directly, or add easier way to include download URLs in a manifest (manifest.add_url('https//..'))

from AWS docs http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingRESTAPImpUpload.html

The largest object that can be uploaded in a single PUT is 5 gigabytes. https://aws.amazon.com/s3/faqs/ (from AWS)

jsh2134 avatar Apr 27 '17 19:04 jsh2134

manifest.add_url(url) workaround

something like

filename = "gnomad.exomes.r2.0.1.sites.vcf.gz"
url = "https://storage.googleapis.com/gnomad-public/release-170228/vcf/exomes/gnomad.exomes.r2.0.1.sites.vcf.gz"
manifest = solvebio.Manifest()
manifest.manifest['files'] = [{'name': filename, 'url': url}]
imp = solvebio.DatasetImport.create(dataset_id=d.id,manifest=manifest.manifest, auto_approve=True)

jsh2134 avatar Apr 27 '17 22:04 jsh2134

Maybe use the multi-part upload functions from https://github.com/requests/toolbelt

davecap avatar Jun 06 '17 01:06 davecap

This is still the limit FYI.

Will get an error like this

Traceback (most recent call last):
  File "/usr/local/bin/solvebio", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python2.7/site-packages/solvebio/cli/main.py", line 309, in main
    return args.func(args)
  File "/usr/local/lib/python2.7/site-packages/solvebio/cli/data.py", line 216, in upload
    vault.full_path)
  File "/usr/local/lib/python2.7/site-packages/solvebio/resource/object.py", line 226, in upload_file
    headers=headers)
  File "/usr/local/lib/python2.7/site-packages/requests/api.py", line 126, in put
    return request('put', url, data=data, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 618, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/requests/adapters.py", line 490, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', error(32, 'Broken pipe'))

Look into ways to intercept this with a useful error message.

jsh2134 avatar Mar 30 '18 15:03 jsh2134

The current workaround is to send manifests with URLs to the files.

davecap avatar Feb 19 '19 15:02 davecap