aws-sdk-java icon indicating copy to clipboard operation
aws-sdk-java copied to clipboard

Slow download using transferManager.downloadDirectory()

Open snj07 opened this issue 3 years ago • 6 comments

Download from S3 using transferManager.downloadDirectory() is slower than s3Client.getObject()

Describe the bug

I am trying to download a folder from S3 bucket and it has > 200 files. When I download it using the following snippet it takes around 161317ms.

TransferManager transferManager = TransferManagerBuilder.standard()
        .withS3Client(s3Client)
        .build();
MultipleFileDownload multipleFileDownload = transferManager.downloadDirectory(
        bucketName, key, new File(destinationFolder));
multipleFileDownload.waitForCompletion();

But when I use the following code with single thread it takes 95369 ms

ObjectListing listResponse = s3Client.listObjects(bucketName, key);
for(S3ObjectSummary x: listResponse.getObjectSummaries()) {
        String targetFileName = getTargetFileNameFromKey(x.getKey());
	GetObjectRequest getObjectRequest = new GetObjectRequest(bucketName, x.getKey());
	s3Client.getObject(getObjectRequest, new File(targetFileName));
}

When I wrap this with ExecutorService with 4 threads, it takes 27246 ms.

executors.submit(() -> {
	s3Client.getObject(getObjectRequest, new File(targetFileName));
});

Expected Behavior

downloadDirectory() should not be slower than single thread s3Client.getObject().

Current Behavior

I have tried downloading less number of files but I could not find downloadFolder() to faster than s3Client.getObject()

Steps to Reproduce

Try to download the folder using the above code snippets and log the time.

Context

This affects the overall performance of the processes . I think, it supports parallel download for large files using multipart but does not download files in parallel. Please let me know if I am wrong here.

Your Environment

  • AWS Java SDK version used: 1.12.7 ( I tried with 1.11.* also)
  • JDK version used: 1.8
  • Operating System and version: - Tried on both Windows 10 and CentOS 7

snj07 avatar Jun 16 '21 13:06 snj07

Hi @snj07 we would also expect the TransferManager to be faster then S3 getObjects so I'm guessing that the performance you're seeing depends on factors like file size and multipart download.

I'll try to repro the issue, can you tell us what's the size of the files? Are all 200 files the exact same?

debora-ito avatar Jun 28 '21 16:06 debora-ito

Hi @debora-ito, All files are of around 1 Mb.

snj07 avatar Jun 28 '21 16:06 snj07

I'm having a similar experience with the TransferManager. @debora-ito did you manage to reproduce the error?

saso5 avatar Mar 04 '22 11:03 saso5

Similar issue: I have a bucket with 200+ files with a total size of ~500 Mb and TransferManager.downloadDirectory() starts downloading very slowly, meaning the time until the first progress update is twice longer that the actual download, which is super weird.

ghost avatar Mar 22 '22 15:03 ghost

Switching to SKD2 'solved' this problem for me.

saso5 avatar Mar 23 '22 05:03 saso5

Is downloadDirectory supported in aws node-sdk? I am not able to find anything like that for node sdk. My use case is to download all keys inside a directory of S3. Any help would be appreciated. For now unfortunately, I have to download each file one by one and then zip them to make singe zip file.

tasdruva avatar May 23 '22 05:05 tasdruva

TransferManager is now generally available in Java 2.x. Our benchmarks showed performance improvement when compared to v1, you can see more info in the blog post announcement - https://aws.amazon.com/blogs/developer/introducing-crt-based-s3-client-and-the-s3-transfer-manager-in-the-aws-sdk-for-java-2-x/

2.x TransferManager support downloadDirectory, please check it out and let us know if you still see the difference.

debora-ito avatar Mar 17 '23 02:03 debora-ito

It looks like this issue has not been active for more than five days. In the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please add a comment to prevent automatic closure, or if the issue is already closed please feel free to reopen it.

github-actions[bot] avatar Mar 22 '23 03:03 github-actions[bot]