cloudpathlib icon indicating copy to clipboard operation
cloudpathlib copied to clipboard

Investigate concurrency support

Open pjbull opened this issue 5 years ago • 2 comments

We probably want to be able to do things like downloads of many files in parallel. Async may help (#28) but some backends may be able to do things like multipart upload/download in parallel and do things across processes in addition to across threads.

Like #28, we'll want tests to make sure the gains are worth the complexity.

pjbull avatar Aug 19 '20 16:08 pjbull

Two options with Azure:

  • max_concurrency parameter - https://docs.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob.blobclient?view=azure-python#download-blob-offset-none--length-none----kwargs-
  • New aio interface: https://docs.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob.aio.blobclient?view=azure-python

For S3, it appears to be setting transfer config with something like:

from boto3.s3.transfer import TransferConfig

config = TransferConfig(
    ...,
    max_concurrency=10,
    use_threads=True
)

pjbull avatar Sep 12 '20 17:09 pjbull

for S3 aioboto3 is another option

karolzlot avatar Nov 08 '21 08:11 karolzlot