aws-sdk-java
aws-sdk-java copied to clipboard
Slow download using transferManager.downloadDirectory()
Download from S3 using transferManager.downloadDirectory()
is slower than s3Client.getObject()
Describe the bug
I am trying to download a folder from S3 bucket and it has > 200 files. When I download it using the following snippet it takes around 161317ms.
TransferManager transferManager = TransferManagerBuilder.standard()
.withS3Client(s3Client)
.build();
MultipleFileDownload multipleFileDownload = transferManager.downloadDirectory(
bucketName, key, new File(destinationFolder));
multipleFileDownload.waitForCompletion();
But when I use the following code with single thread it takes 95369 ms
ObjectListing listResponse = s3Client.listObjects(bucketName, key);
for(S3ObjectSummary x: listResponse.getObjectSummaries()) {
String targetFileName = getTargetFileNameFromKey(x.getKey());
GetObjectRequest getObjectRequest = new GetObjectRequest(bucketName, x.getKey());
s3Client.getObject(getObjectRequest, new File(targetFileName));
}
When I wrap this with ExecutorService with 4 threads, it takes 27246 ms.
executors.submit(() -> {
s3Client.getObject(getObjectRequest, new File(targetFileName));
});
Expected Behavior
downloadDirectory()
should not be slower than single thread s3Client.getObject()
.
Current Behavior
I have tried downloading less number of files but I could not find downloadFolder()
to faster than s3Client.getObject()
Steps to Reproduce
Try to download the folder using the above code snippets and log the time.
Context
This affects the overall performance of the processes . I think, it supports parallel download for large files using multipart but does not download files in parallel. Please let me know if I am wrong here.
Your Environment
- AWS Java SDK version used: 1.12.7 ( I tried with 1.11.* also)
- JDK version used: 1.8
- Operating System and version: - Tried on both Windows 10 and CentOS 7
Hi @snj07 we would also expect the TransferManager to be faster then S3 getObjects so I'm guessing that the performance you're seeing depends on factors like file size and multipart download.
I'll try to repro the issue, can you tell us what's the size of the files? Are all 200 files the exact same?
Hi @debora-ito, All files are of around 1 Mb.
I'm having a similar experience with the TransferManager. @debora-ito did you manage to reproduce the error?
Similar issue: I have a bucket with 200+ files with a total size of ~500 Mb and TransferManager.downloadDirectory()
starts downloading very slowly, meaning the time until the first progress update is twice longer that the actual download, which is super weird.
Switching to SKD2 'solved' this problem for me.
Is downloadDirectory supported in aws node-sdk? I am not able to find anything like that for node sdk. My use case is to download all keys inside a directory of S3. Any help would be appreciated. For now unfortunately, I have to download each file one by one and then zip them to make singe zip file.
TransferManager is now generally available in Java 2.x. Our benchmarks showed performance improvement when compared to v1, you can see more info in the blog post announcement - https://aws.amazon.com/blogs/developer/introducing-crt-based-s3-client-and-the-s3-transfer-manager-in-the-aws-sdk-for-java-2-x/
2.x TransferManager support downloadDirectory, please check it out and let us know if you still see the difference.
It looks like this issue has not been active for more than five days. In the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please add a comment to prevent automatic closure, or if the issue is already closed please feel free to reopen it.