aws-sdk-java-v2
aws-sdk-java-v2 copied to clipboard
Transfer Manager
Review the inherited state of the V1 transfer manager and determine which changes are necessary for V2.
(Feel free to comment on this issue with desired changes).
Upload and download return a Transfer which has getProgress(). This returns a simple way to get a percent complete but requires you to do busy loops to update things like UIs.
On the other hand, ProgressListener only returns the bytes transferred causing you to have to do the percentage manually and not very easy since the total size is not exposed.
Better feature parity on the ProgressListener would be nice since that plays better with async
+1 on parity with progress listener. There should probably just be one system between transfer manager and the rest of the SDK for monitoring progress.
Use DirectoryStream for loading files from a directory to avoid having to load all file names into memory. See https://github.com/aws/aws-sdk-java/issues/1271.
Unreasonable to expect the thread-pool will be unbounded in order to avoid deadlocks see : https://github.com/aws/aws-sdk-java/issues/939
TransferManager features requests from v1:
https://github.com/aws/aws-sdk-java/issues/117 https://github.com/aws/aws-sdk-java/issues/284 https://github.com/aws/aws-sdk-java/issues/474 https://github.com/aws/aws-sdk-java/issues/645 https://github.com/aws/aws-sdk-java/issues/893 https://github.com/aws/aws-sdk-java/issues/964 https://github.com/aws/aws-sdk-java/issues/988 https://github.com/aws/aws-sdk-java/issues/1215 https://github.com/aws/aws-sdk-java/issues/1207 https://github.com/aws/aws-sdk-java/issues/1103
https://github.com/aws/aws-sdk-java/issues/1321
Allow using a finite number of threads for background processing. Currently, 1.11.x's TransferManager is reported to require an unbounded thread pool to prevent deadlocks.
+1 for aws/aws-sdk-java#1103, a request for the ability to limit bandwidth for S3 uploads/downloads. See also the recently closed issue from the aws-cli repo: https://github.com/aws/aws-cli/issues/1090
This same feature would be similarly useful in the Java SDK to help avoid fees from ISPs for excessive bandwidth usage, or to prevent a single application from overwhelming a network's capacity.
+1 for aws/aws-sdk-java#1103, too fast data downloading will saturate the network usage.
+1 for aws/aws-sdk-java#474.
It is very inefficient to write to the file system then upload from the file when I have an object in memory I can serialize directly to a stream. It seems counterintuitive to provide the total content length up front when providing a stream as input - I have to work around it instead of just use it.
+1 for https://github.com/aws/aws-sdk-java/issues/893
The primitive AmazonS3 client is capable of uploading and downloading to and from a stream, as well as from a file. The TransferManager can also upload from either a stream or a file, but can only download to a file - not a stream. Symmetry in the interface would be nice. For large files - the kind for which multi-part uploads are most valuable - I can understand that attempting to buffer contents in memory is unwise. However, for small files, I question the value of having to write and read from disk. The download/upload interface is pleasingly abstract, relative to the interface of the primitive client, and I'd like to favor it no matter the size of my files.
+1 for aws/aws-sdk-java#1207. I need to be able to upload a lot of files all at once while specifying the ACL. Our customers will be using the CLI to upload files in parallel and I need to closely match the performance in simulating file uploads. If I don't specify the ACL flag, our service cannot read those files and my tests are useless.
+1 for aws/aws-sdk-java#893
My current use-case for this is that I have 100MB+ compressed (GZIP) files on S3 that I need to download and perform some further conversion on.
It would be great to take advantage of multi-part download and have that stream through Java's GZIPInputStream so that I don't need to download and then uncompress separately.
Is is possible to support the aws cli style of "sync" where TransferManager decides which files are different and only uploads the different ones?
Reminder that for https://github.com/aws/aws-sdk-java/issues/474, I have written a library using the SDK v1 which allows streaming data to S3 without knowing the size beforehand and without keeping it all in memory or writing to disk. You may find the source code helpful for implementing the feature in v2. I am not planning on porting the library to use v2. Implementing the feature in v2 may have advantages over my library, e.g. by using asnyc non-blocking I/O instead of many threads.
@sql4bucks see the library if you haven't already, you may find it useful.
Thanks @alexmojaki. We will keep this in mind when investigating how to address https://github.com/aws/aws-sdk-java/issues/474.
Hi all,
For anyone interested, here is the current design for TransferManager: https://github.com/aws/aws-sdk-java-v2/tree/master/docs/design/services/s3/transfermanager. Feel free to leave feedback and comments!
The README will be updated soon to go into depth on current prototype.
Hi all,
For anyone interested, here is the current design for TransferManager: https://github.com/aws/aws-sdk-java-v2/tree/master/docs/design/services/s3/transfermanager. Feel free to leave feedback and comments!
The README will be updated soon to go into depth on current prototype.
When can this be expected for use? And which version?
Feature request from v1:
Provide getSubTransfers() method for MultipleFileDownload - https://github.com/aws/aws-sdk-java/issues/785
Feature requests from v1:
- Support for custom collections of transfers in TransferManager - https://github.com/aws/aws-sdk-java/issues/1541
- Make TransferManager compatible with AWS X-Ray - https://github.com/aws/aws-sdk-java/issues/1572
- TransferManager doesn't handle constraint failures gracefully - https://github.com/aws/aws-sdk-java/issues/1644
From V1:
- Support sync: https://github.com/aws/aws-sdk-java/issues/2131
Feature request from V1:
- Replace S3 downloadInParallel by using Content-Range requests instead of undocumented S3 part requests - https://github.com/aws/aws-sdk-java/issues/1303
Hi all,
For anyone interested, here is the current design for TransferManager: https://github.com/aws/aws-sdk-java-v2/tree/master/docs/design/services/s3/transfermanager. Feel free to leave feedback and comments!
The README will be updated soon to go into depth on current prototype.
Any update on this? Any idea when it could be ready?
Hi all,
For anyone interested, here is the current design for TransferManager: https://github.com/aws/aws-sdk-java-v2/tree/master/docs/design/services/s3/transfermanager. Feel free to leave feedback and comments!
The README will be updated soon to go into depth on current prototype.
Will there be a separate high-level GlacierTransferManager for Glacier vaults/archives, or will Glacier operations be absorbed into S3?
Any update on this? Any idea when it could be ready?
Rarely does a backend create files on the server, so using a directory as a source for uploads causes unnecessary memory and performance problems. Ideally, the backend sends the MultipartFile received from the frontend without having to save it internally on the server and then upload it, this is frustrating and unnecessary. Cloud computing is expensive, therefore creating unnecessary directories is not reasonably acceptable.
Proposal: In TransferManager, create a new method using InputStream instead of File: MultipleFileUpload uploadFileList(String bucketName, List<InputStream> streams) Is very important to use InputStream because many times we manipulate the files sent by users, for example reducing images.
Hi all, we have released the Developer Preview of Transfer Manager. Currently it supports single file upload and download, and we are actively working on adding more features.
<dependency>
<groupId>software.amazon.awssdk</groupId>
<artifactId>s3-transfer-manager</artifactId>
<version>2.17.16-PREVIEW</version>
</dependency>
You can find sample code here: https://github.com/aws/aws-sdk-java-v2/tree/master/services-custom/s3-transfer-manager
Give it a try and let us know what you think! 🙂
@zoewangg Hi. Is there any support plan that reactive input like Flux? Currently, it seems that only file is supported as input of upload.
https://github.com/aws/aws-sdk-java-v2/issues/2731
downloadDirectory is missing.