aws-sdk-java-v2 icon indicating copy to clipboard operation
aws-sdk-java-v2 copied to clipboard

Downloading a directory using DownloadDirectoryRequest results in FileSystemException

Open rickyeng127 opened this issue 1 year ago • 9 comments

Describe the bug

When I attempt download all files in a prefix from S3 using a DownloadDirectoryRequest, I receive an exception:

2023-07-07T11:05:34.948-04:00 Exception in thread "Thread-9" java.io.UncheckedIOException: java.nio.file.FileSystemException: /tmp/pipeline: Is a directory

Note: /tmp/pipeline is the destination path that I am setting in the DownloadDirectoryRequest object.

Expected Behavior

I expect all the files to be downloaded to the specified local directory.

Current Behavior

The following is a log trace of the exception:

2023-07-07T11:05:34.941-04:00	[TRACE] [2023-07-07T15:05:34Z] [00007f752df35700] [tls-handler] - id=0x7f758080ef10: Bytes read 530
2023-07-07T11:05:34.941-04:00	[TRACE] [2023-07-07T15:05:34Z] [00007f752df35700] [channel] - id=0x7f75809075a0: sending read message of size 530, from slot 0x7f75809141d0 to slot 0x7f7580801150 with handler 0x7f758091ed88.
2023-07-07T11:05:34.941-04:00	[TRACE] [2023-07-07T15:05:34Z] [00007f752df35700] [http-connection] - id=0x7f758091ed80: Incoming message of size 530.
2023-07-07T11:05:34.941-04:00	[TRACE] [2023-07-07T15:05:34Z] [00007f752df35700] [http-stream] - id=0x7f7580912680: Incoming response status: 200 (OK).
2023-07-07T11:05:34.941-04:00	[TRACE] [2023-07-07T15:05:34Z] [00007f752df35700] [http-stream] - id=0x7f7580912680: Incoming header: x-amz-id-2: h1WxnF/FOsoHZLasbZ2DQeuEJMpBdrDAMdePtscfB6358d9Vioir2EBi56xRXBrn6PeF+iskWdA=
2023-07-07T11:05:34.941-04:00	[TRACE] [2023-07-07T15:05:34Z] [00007f752df35700] [http-stream] - id=0x7f7580912680: Incoming header: x-amz-request-id: TQFJC4T3EEDG4B3Y
2023-07-07T11:05:34.941-04:00	[TRACE] [2023-07-07T15:05:34Z] [00007f752df35700] [http-stream] - id=0x7f7580912680: Incoming header: Date: Fri, 07 Jul 2023 15:05:35 GMT
2023-07-07T11:05:34.941-04:00	[TRACE] [2023-07-07T15:05:34Z] [00007f752df35700] [http-stream] - id=0x7f7580912680: Incoming header: Last-Modified: Fri, 30 Jun 2023 15:33:24 GMT
2023-07-07T11:05:34.941-04:00	[TRACE] [2023-07-07T15:05:34Z] [00007f752df35700] [http-stream] - id=0x7f7580912680: Incoming header: ETag: "550d2b501e6eaf93d22136d5ab191341"
2023-07-07T11:05:34.941-04:00	[TRACE] [2023-07-07T15:05:34Z] [00007f752df35700] [http-stream] - id=0x7f7580912680: Incoming header: x-amz-server-side-encryption: aws:kms
2023-07-07T11:05:34.941-04:00	[TRACE] [2023-07-07T15:05:34Z] [00007f752df35700] [http-stream] - id=0x7f7580912680: Incoming header: x-amz-server-side-encryption-aws-kms-key-id: arn:aws:kms:us-east-1:460800218528:key/89b265d2-bec0-4160-b439-5a6876a74b1c
2023-07-07T11:05:34.941-04:00	[TRACE] [2023-07-07T15:05:34Z] [00007f752df35700] [http-stream] - id=0x7f7580912680: Incoming header: Accept-Ranges: bytes
2023-07-07T11:05:34.941-04:00	[TRACE] [2023-07-07T15:05:34Z] [00007f752df35700] [http-stream] - id=0x7f7580912680: Incoming header: Content-Type: application/octet-stream
2023-07-07T11:05:34.948-04:00	Exception in thread "Thread-9" java.io.UncheckedIOException: java.nio.file.FileSystemException: /tmp/pipeline: Is a directory
2023-07-07T11:05:34.948-04:00	at software.amazon.awssdk.utils.FunctionalUtils.asRuntimeException(FunctionalUtils.java:180)
2023-07-07T11:05:34.948-04:00	at software.amazon.awssdk.utils.FunctionalUtils.lambda$safeSupplier$4(FunctionalUtils.java:110)
2023-07-07T11:05:34.948-04:00	at software.amazon.awssdk.utils.FunctionalUtils.invokeSafely(FunctionalUtils.java:136)
2023-07-07T11:05:34.948-04:00	at software.amazon.awssdk.core.internal.async.FileAsyncResponseTransformer.onStream(FileAsyncResponseTransformer.java:125)
2023-07-07T11:05:34.948-04:00	at software.amazon.awssdk.core.async.listener.AsyncResponseTransformerListener$NotifyingAsyncResponseTransformer.onStream(AsyncResponseTransformerListener.java:93)
2023-07-07T11:05:34.948-04:00	at software.amazon.awssdk.core.async.listener.AsyncResponseTransformerListener$NotifyingAsyncResponseTransformer.onStream(AsyncResponseTransformerListener.java:93)
2023-07-07T11:05:34.948-04:00	at software.amazon.awssdk.core.internal.http.async.AsyncStreamingResponseHandler.onStream(AsyncStreamingResponseHandler.java:63)
2023-07-07T11:05:34.948-04:00	at software.amazon.awssdk.core.internal.http.IdempotentAsyncResponseHandler.onStream(IdempotentAsyncResponseHandler.java:108)
2023-07-07T11:05:34.948-04:00	at software.amazon.awssdk.core.internal.http.async.CombinedResponseAsyncHttpResponseHandler.onStream(CombinedResponseAsyncHttpResponseHandler.java:85)
2023-07-07T11:05:34.948-04:00	at software.amazon.awssdk.core.internal.http.async.AsyncAfterTransmissionInterceptorCallingResponseHandler.onStream(AsyncAfterTransmissionInterceptorCallingResponseHandler.java:86)
2023-07-07T11:05:34.948-04:00	at software.amazon.awssdk.services.s3.internal.crt.S3CrtResponseHandlerAdapter.onResponseHeaders(S3CrtResponseHandlerAdapter.java:62)
2023-07-07T11:05:34.948-04:00	at software.amazon.awssdk.crt.s3.S3MetaRequestResponseHandlerNativeAdapter.onResponseHeaders(S3MetaRequestResponseHandlerNativeAdapter.java:28)
2023-07-07T11:05:34.948-04:00	Caused by: java.nio.file.FileSystemException: /tmp/pipeline: Is a directory
2023-07-07T11:05:34.948-04:00	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
2023-07-07T11:05:34.948-04:00	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
2023-07-07T11:05:34.948-04:00	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
2023-07-07T11:05:34.948-04:00	at sun.nio.fs.UnixFileSystemProvider.newAsynchronousFileChannel(UnixFileSystemProvider.java:196)
2023-07-07T11:05:34.948-04:00	at java.nio.channels.AsynchronousFileChannel.open(AsynchronousFileChannel.java:248)
2023-07-07T11:05:34.948-04:00	at software.amazon.awssdk.core.internal.async.FileAsyncResponseTransformer.createChannel(FileAsyncResponseTransformer.java:103)
2023-07-07T11:05:34.948-04:00	at software.amazon.awssdk.core.internal.async.FileAsyncResponseTransformer.lambda$onStream$2(FileAsyncResponseTransformer.java:125)
2023-07-07T11:05:34.948-04:00	at software.amazon.awssdk.utils.FunctionalUtils.lambda$safeSupplier$4(FunctionalUtils.java:108)
2023-07-07T11:05:34.949-04:00	... 10 more
2023-07-07T11:05:34.949-04:00	[TRACE] [2023-07-07T15:05:34Z] [00007f752df35700] [http-stream] - id=0x7f7580912680: Incoming header: Server: AmazonS3
2023-07-07T11:05:34.949-04:00	[TRACE] [2023-07-07T15:05:34Z] [00007f752df35700] [http-stream] - id=0x7f7580912680: Incoming header: Content-Length: 0
2023-07-07T11:05:34.949-04:00	[TRACE] [2023-07-07T15:05:34Z] [00007f752df35700] [http-stream] - id=0x7f7580912680: Main header block done.
2023-07-07T11:05:34.949-04:00	[DEBUG] [2023-07-07T15:05:34Z] [00007f752df35700] [http-stream] - id=0x7f7580912680: Client request complete, response status: 200 (OK).
2023-07-07T11:05:34.949-04:00	[DEBUG] [2023-07-07T15:05:34Z] [00007f752df35700] [S3MetaRequest] - id=0x7f754002a210: Request 0x7f7580907350 finished with error code 0 (aws-c-common: AWS_ERROR_SUCCESS, Success.) and response status 200
2023-07-07T11:05:34.949-04:00	[TRACE] [2023-07-07T15:05:34Z] [00007f752df35700] [http-stream] - id=0x7f7580912680: Stream refcount released, 1 remaining.
2023-07-07T11:05:34.949-04:00	[DEBUG] [2023-07-07T15:05:34Z] [00007f752df35700] [standard-retry-strategy] - token_id=0x7f7538116920: partition=pipeline-devl-sfdl-us-east-1.s3.us-east-1.amazonaws.com: recording successful operation and adding 1 units of capacity back to the bucket.
2023-07-07T11:05:34.949-04:00	[TRACE] [2023-07-07T15:05:34Z] [00007f752df35700] [standard-retry-strategy] - bucket_id=0x7f7538116920: partition=pipeline-devl-sfdl-us-east-1.s3.us-east-1.amazonaws.com : new capacity is 500.
2023-07-07T11:05:34.949-04:00	[TRACE] [2023-07-07T15:05:34Z] [00007f752df35700] [standard-retry-strategy] - id=0x7f7538116920: releasing token
2023-07-07T11:05:34.949-04:00	[TRACE] [2023-07-07T15:05:34Z] [00007f752df35700] [event-loop] - id=0x7f75502a64d0: scheduling task 0x7f75503e4ab8 in-thread for timestamp 0
2023-07-07T11:05:34.949-04:00	[DEBUG] [2023-07-07T15:05:34Z] [00007f752df35700] [task-scheduler] - id=0x7f75503e4ab8: Scheduling s3_client_process_work_task task for immediate execution
2023-07-07T11:05:34.949-04:00	[TRACE] [2023-07-07T15:05:34Z] [00007f752df35700] [S3MetaRequest] - id=0x7f754002a210 Etag received for the meta request. value is: "550d2b501e6eaf93d22136d5ab191341"
2023-07-07T11:05:34.949-04:00   [ERROR] [2023-07-07T15:05:34Z] [00007f752df35700] [S3MetaRequest] - id=0x7f754002a210: Exception thrown from S3MetaRequest.onResponseHeaders callback
2023-07-07T11:05:34.949-04:00	[DEBUG] [2023-07-07T15:05:34Z] [00007f752df35700] [S3MetaRequest] - id=0x7f754002a210 Head object completed.
2023-07-07T11:05:34.949-04:00	[DEBUG] [2023-07-07T15:05:34Z] [00007f752df35700] [connection-manager] - id=0x7f75400297a0: User releasing connection (id=0x7f758091ed80)
2023-07-07T11:05:34.949-04:00	[DEBUG] [2023-07-07T15:05:34Z] [00007f752df35700] [connection-manager] - id=0x7f75400297a0: snapshot - state=1, idle_connection_count=1, pending_acquire_count=0, pending_settings_count=0, pending_connect_count=0, vended_connection_count=8, open_connection_count=9, ref_count=1
2023-07-07T11:05:34.949-04:00   [TRACE] [2023-07-07T15:05:34Z] [00007f752df35700] [http-stream] - id=0x7f7580912680: Final stream refcount released.
2023-07-07T11:05:34.949-04:00   [TRACE] [2023-07-07T15:05:34Z] [00007f752df35700] [http-connection] - id=0x7f758091ed80: Connection refcount released, 1 remaining.

Reproduction Steps

The following is the code that I am using to perform this download:

String s3BucketName) = "MyBucket";
String prefix = "MyPrefix";
String destinationPath = "/tmp/pipeline";

S3AsyncClient s3AsyncClient =
        S3AsyncClient.crtBuilder()
          .targetThroughputInGbps(20.0)
          .minimumPartSizeInBytes(1000000L)
          .build();

S3TransferManager transferManager = S3TransferManager.builder()
	.s3Client(s3AsyncClient)
	.build();

DownloadDirectoryRequest downloadDirectoryRequest = DownloadDirectoryRequest.builder()
  .destination(Paths.get(destinationPath))
  .bucket(s3BucketName)
  .listObjectsV2RequestTransformer(transformer -> transformer.prefix(prefix))
  .build();

DirectoryDownload directoryDownload = transferManager.downloadDirectory(downloadDirectoryRequest);

CompletedDirectoryDownload completedDirectoryDownload = directoryDownload.completionFuture().join();

Possible Solution

No response

Additional Information/Context

No response

AWS Java SDK version used

2.20.94

JDK version used

Java 1.8

Operating System and version

amzlnx:5.8.1

rickyeng127 avatar Jul 07 '23 16:07 rickyeng127

I have a guess this is related to empty "folder" objects, but I can't reproduce the error locally.

@rickyeng127 Can you show the result of ListObjectsV2 for that bucket "MyBucket"? I'm trying to recreate the same folder structure you have so I can reproduce.

Moving this to the aws-sdk-java-v2 repository.

debora-ito avatar Aug 01 '23 22:08 debora-ito

Hi @debora-ito,

A LiveObjectsV2 from the sample bucket returns:

MyPrefix/ MyPrefix/temp.txt

The prefix has a single file.

The error indicates that there is an issue with the destination directory (/tmp/pipeline) as opposed to the source files in S3. I can confirm that the directory exists. We are using a fargate ECS container to execute this code.

We are using the Java AWS SDK version 2.20.94 and the crt library version 0.22.2.

What is interesting is that the file sucessfully downloads, but the execption is thrown afterwards.

Appears that the issue is caused when an attempt is made to open an asynchronous file channel using the UnixFileSystemProvider.

rickyeng127 avatar Aug 02 '23 14:08 rickyeng127

Hello, I believe that I am hitting the same issue. This the result of my ListObjectsV2 request:

ListObjectsV2Response(IsTruncated=false, Contents=[S3Object(Key=ardes-testutils-s3utils-test/, LastModified=2021-03-31T13:52:19Z, ETag="d41d8cd98f00b204e9800998ecf8427e", Size=0, StorageClass=STANDARD), S3Object(Key=ardes-testutils-s3utils-test/testfile.txt, LastModified=2021-03-31T13:55:03Z, ETag="3908ff52ab1b04a33fd5da65c1d39352", Size=109, StorageClass=STANDARD)], Name=ardes.test, Prefix=ardes-testutils-s3utils-test, MaxKeys=1000, KeyCount=2)

Querying with the prefix ardes-testutils-s3utils-test.

euclio avatar Nov 03 '23 12:11 euclio

Hello, we are runnign intothe same issue as well.

We are using the Java AWS SDK version 2.20.68 and the crt library version 0.21.16. We are downloading a directory and we are using the code supplied in the documentation. We are also seeing the files being created and the exception is thrown for every sub-directory after a successful download.

Is there any way we can work aorund this error or is there a version of the Java AWS SDK / crt library that we can use that does not have this error?

RRajdev avatar Nov 30 '23 23:11 RRajdev

I just tested the SDK version 2.21.34 and I don't see the exception anymore, can you test using version 2.21.34 or later? We recently upgraded the version of the crt client.

@rickyeng127 @euclio @RRajdev

debora-ito avatar Dec 08 '23 23:12 debora-ito

w are still getting the error with SDK v. 2.21.34 and the latest version of the crt client

RRajdev avatar Dec 11 '23 17:12 RRajdev

@RRajdev can you share the stacktrace with the error?

debora-ito avatar Dec 14 '23 23:12 debora-ito

@RRajdev I had a simillar issue with the SDK v 2.21.7. Not sure if it helps you or not, but for me it worked after adding the delimiter("/")..

 DownloadDirectoryRequest ddReq = DownloadDirectoryRequest.builder()
                .destination(pathToDownload)
                .bucket(bucket)
                .listObjectsV2RequestTransformer(l -> l.prefix(key).delimiter("/"))
                .downloadFileRequestTransformer(request -> request.addTransferListener(
                    listener))
                .build();

matanasoaei avatar Dec 15 '23 08:12 matanasoaei

@matanasoaei Thanks. Adding "/" helped resolve this issue.

asifsmohammed avatar Jan 08 '24 21:01 asifsmohammed