aws-sdk-java-v2 icon indicating copy to clipboard operation
aws-sdk-java-v2 copied to clipboard

Transfer Manager

Open millems opened this issue 8 years ago • 41 comments

Review the inherited state of the V1 transfer manager and determine which changes are necessary for V2.

(Feel free to comment on this issue with desired changes).

millems avatar Jul 03 '17 17:07 millems

Upload and download return a Transfer which has getProgress(). This returns a simple way to get a percent complete but requires you to do busy loops to update things like UIs.

On the other hand, ProgressListener only returns the bytes transferred causing you to have to do the percentage manually and not very easy since the total size is not exposed.

Better feature parity on the ProgressListener would be nice since that plays better with async

abrooksv avatar Jul 18 '17 22:07 abrooksv

+1 on parity with progress listener. There should probably just be one system between transfer manager and the rest of the SDK for monitoring progress.

millems avatar Jul 21 '17 20:07 millems

Use DirectoryStream for loading files from a directory to avoid having to load all file names into memory. See https://github.com/aws/aws-sdk-java/issues/1271.

millems avatar Aug 10 '17 23:08 millems

Unreasonable to expect the thread-pool will be unbounded in order to avoid deadlocks see : https://github.com/aws/aws-sdk-java/issues/939

kiiadi avatar Aug 16 '17 15:08 kiiadi

TransferManager features requests from v1:

https://github.com/aws/aws-sdk-java/issues/117 https://github.com/aws/aws-sdk-java/issues/284 https://github.com/aws/aws-sdk-java/issues/474 https://github.com/aws/aws-sdk-java/issues/645 https://github.com/aws/aws-sdk-java/issues/893 https://github.com/aws/aws-sdk-java/issues/964 https://github.com/aws/aws-sdk-java/issues/988 https://github.com/aws/aws-sdk-java/issues/1215 https://github.com/aws/aws-sdk-java/issues/1207 https://github.com/aws/aws-sdk-java/issues/1103

spfink avatar Sep 13 '17 22:09 spfink

https://github.com/aws/aws-sdk-java/issues/1321

millems avatar Sep 29 '17 18:09 millems

Allow using a finite number of threads for background processing. Currently, 1.11.x's TransferManager is reported to require an unbounded thread pool to prevent deadlocks.

millems avatar Jan 02 '18 21:01 millems

+1 for aws/aws-sdk-java#1103, a request for the ability to limit bandwidth for S3 uploads/downloads. See also the recently closed issue from the aws-cli repo: https://github.com/aws/aws-cli/issues/1090

This same feature would be similarly useful in the Java SDK to help avoid fees from ISPs for excessive bandwidth usage, or to prevent a single application from overwhelming a network's capacity.

erikedlund avatar Jan 02 '18 21:01 erikedlund

+1 for aws/aws-sdk-java#1103, too fast data downloading will saturate the network usage.

zhiqiangZHAO avatar Jan 04 '18 01:01 zhiqiangZHAO

+1 for aws/aws-sdk-java#474.

It is very inefficient to write to the file system then upload from the file when I have an object in memory I can serialize directly to a stream. It seems counterintuitive to provide the total content length up front when providing a stream as input - I have to work around it instead of just use it.

sql4bucks avatar Sep 12 '18 21:09 sql4bucks

+1 for https://github.com/aws/aws-sdk-java/issues/893

The primitive AmazonS3 client is capable of uploading and downloading to and from a stream, as well as from a file. The TransferManager can also upload from either a stream or a file, but can only download to a file - not a stream. Symmetry in the interface would be nice. For large files - the kind for which multi-part uploads are most valuable - I can understand that attempting to buffer contents in memory is unwise. However, for small files, I question the value of having to write and read from disk. The download/upload interface is pleasingly abstract, relative to the interface of the primitive client, and I'd like to favor it no matter the size of my files.

josephsmithiv avatar Sep 18 '18 02:09 josephsmithiv

+1 for aws/aws-sdk-java#1207. I need to be able to upload a lot of files all at once while specifying the ACL. Our customers will be using the CLI to upload files in parallel and I need to closely match the performance in simulating file uploads. If I don't specify the ACL flag, our service cannot read those files and my tests are useless.

TeresaP avatar Nov 29 '18 17:11 TeresaP

+1 for aws/aws-sdk-java#893

My current use-case for this is that I have 100MB+ compressed (GZIP) files on S3 that I need to download and perform some further conversion on.

It would be great to take advantage of multi-part download and have that stream through Java's GZIPInputStream so that I don't need to download and then uncompress separately.

bisoldi avatar Dec 18 '18 19:12 bisoldi

Is is possible to support the aws cli style of "sync" where TransferManager decides which files are different and only uploads the different ones?

chrisvire avatar Mar 14 '19 21:03 chrisvire

Reminder that for https://github.com/aws/aws-sdk-java/issues/474, I have written a library using the SDK v1 which allows streaming data to S3 without knowing the size beforehand and without keeping it all in memory or writing to disk. You may find the source code helpful for implementing the feature in v2. I am not planning on porting the library to use v2. Implementing the feature in v2 may have advantages over my library, e.g. by using asnyc non-blocking I/O instead of many threads.

@sql4bucks see the library if you haven't already, you may find it useful.

alexmojaki avatar Apr 19 '19 13:04 alexmojaki

Thanks @alexmojaki. We will keep this in mind when investigating how to address https://github.com/aws/aws-sdk-java/issues/474.

dagnir avatar Apr 19 '19 20:04 dagnir

Hi all,

For anyone interested, here is the current design for TransferManager: https://github.com/aws/aws-sdk-java-v2/tree/master/docs/design/services/s3/transfermanager. Feel free to leave feedback and comments!

The README will be updated soon to go into depth on current prototype.

dagnir avatar Apr 23 '19 20:04 dagnir

Hi all,

For anyone interested, here is the current design for TransferManager: https://github.com/aws/aws-sdk-java-v2/tree/master/docs/design/services/s3/transfermanager. Feel free to leave feedback and comments!

The README will be updated soon to go into depth on current prototype.

When can this be expected for use? And which version?

abhimanyu4211 avatar May 01 '19 18:05 abhimanyu4211

Feature request from v1:

Provide getSubTransfers() method for MultipleFileDownload - https://github.com/aws/aws-sdk-java/issues/785

debora-ito avatar Jul 18 '19 22:07 debora-ito

Feature requests from v1:

  • Support for custom collections of transfers in TransferManager - https://github.com/aws/aws-sdk-java/issues/1541
  • Make TransferManager compatible with AWS X-Ray - https://github.com/aws/aws-sdk-java/issues/1572
  • TransferManager doesn't handle constraint failures gracefully - https://github.com/aws/aws-sdk-java/issues/1644

debora-ito avatar Oct 10 '19 01:10 debora-ito

From V1:

  • Support sync: https://github.com/aws/aws-sdk-java/issues/2131

dagnir avatar Oct 22 '19 21:10 dagnir

Feature request from V1:

  • Replace S3 downloadInParallel by using Content-Range requests instead of undocumented S3 part requests - https://github.com/aws/aws-sdk-java/issues/1303

debora-ito avatar Jan 03 '20 01:01 debora-ito

Hi all,

For anyone interested, here is the current design for TransferManager: https://github.com/aws/aws-sdk-java-v2/tree/master/docs/design/services/s3/transfermanager. Feel free to leave feedback and comments!

The README will be updated soon to go into depth on current prototype.

Any update on this? Any idea when it could be ready?

fleiber avatar Apr 10 '20 11:04 fleiber

Hi all,

For anyone interested, here is the current design for TransferManager: https://github.com/aws/aws-sdk-java-v2/tree/master/docs/design/services/s3/transfermanager. Feel free to leave feedback and comments!

The README will be updated soon to go into depth on current prototype.

Will there be a separate high-level GlacierTransferManager for Glacier vaults/archives, or will Glacier operations be absorbed into S3?

agkeahan avatar Apr 12 '20 23:04 agkeahan

Any update on this? Any idea when it could be ready?

vincent-dm avatar Jan 28 '21 15:01 vincent-dm

Rarely does a backend create files on the server, so using a directory as a source for uploads causes unnecessary memory and performance problems. Ideally, the backend sends the MultipartFile received from the frontend without having to save it internally on the server and then upload it, this is frustrating and unnecessary. Cloud computing is expensive, therefore creating unnecessary directories is not reasonably acceptable.

Proposal: In TransferManager, create a new method using InputStream instead of File: MultipleFileUpload uploadFileList(String bucketName, List<InputStream> streams) Is very important to use InputStream because many times we manipulate the files sent by users, for example reducing images.

#2572

jereztech avatar May 18 '21 01:05 jereztech

Hi all, we have released the Developer Preview of Transfer Manager. Currently it supports single file upload and download, and we are actively working on adding more features.

<dependency>
  <groupId>software.amazon.awssdk</groupId>
  <artifactId>s3-transfer-manager</artifactId>
  <version>2.17.16-PREVIEW</version>
</dependency>

You can find sample code here: https://github.com/aws/aws-sdk-java-v2/tree/master/services-custom/s3-transfer-manager

Give it a try and let us know what you think! 🙂

zoewangg avatar Aug 10 '21 22:08 zoewangg

@zoewangg Hi. Is there any support plan that reactive input like Flux? Currently, it seems that only file is supported as input of upload.

pkgonan avatar Sep 13 '21 06:09 pkgonan

https://github.com/aws/aws-sdk-java-v2/issues/2731

ashishdhingra avatar Sep 27 '21 20:09 ashishdhingra

downloadDirectory is missing.

exoego avatar Nov 07 '21 12:11 exoego