nextflow
nextflow copied to clipboard
Upgrade to AWS Java SDK v2
This PR contains the changes to port the Amazon plugin to AWS SDK version 2. Find below the most relevant changes:
- S3 Global Region flag: In v1, it was activated with
S3Client.withForceGlobalBucketAcessEnabled(flag). In v2, it set the following flagsS3Client.Builder.crossRegionAccessEnabled(flag)andS3Configuration.multiRegionAccessEnabled(flag). - Two S3Clients are created: the async client is used for operations performed through the S3TransferManager, and the sync client is used for other actions.
- AmazonS3Client.getS3AccountOwner() is not available in SDK v2. It was providing an ID used for checking the file access. In V2, the only way to retrieve the same ID is from a bucket owned by the user. To do it we need to list the buckets and get the owner field in the GetBucketACLResponse. If it is not possible to retrieve the ID because the user does not own any bucket, we perform the following fallback. In the case of READ access, it tries to retrieve the head of the object, It will fail if there isn't read access. In the case of writting, a warning is printed. It is the same as AWS NIO is doing to check the file access.
-
The
setEndpointandsetRegionmethods in the S3Client wrapper are removed as it is not available in the v2 clients. They were only used in tests. -
CannedAccessControlListis split in two classes one for objects and another for buckets. In most of the code it has been substituted by ObjectCannedACL. -
ContentTypeandContentLengthare part of the request instead of theObjectMetadata, and they can be obtained invoking the S3client.headObject method in the SDK v2 -
S3ClientConfigurationdoesn't exist in SDK v2. Two new classes have been created to emulate the same behaviour. They convert the properties to the SDK v2 sync and async client configurations. -
SsoCredentialsProviderV1class is not needed anymore as SDK v2 already manages the SSO credentials. The custom provider chain created in theS3FileSystemProvider.getCredetialsProvider0to include theSsoCredentialsProviderV1ihas been replace by theDefaultCredentialProviderin v2. -
Credentials and config are automatically merged by SDK v2. No option for NXF_DISABLE_AWS_CONFIG_MERGE.
-
In V2, clients and requests are immutable and must be generated with a builder class. Some helper methods have been modified to pass builder classes instead of requests, such as
makeJobDefRequest,configJobRefRequest,addVolumeMountsToContainer, etc. -
S3 Parallel Download was deprecated and S3CopyStream was not used. They have been removed.
-
In v1, the upload directory was performed by walking through the different directory files and uploading them one by one. In v2, it has been substituted by the uploadDirectory method in the SDK.
Deploy Preview for nextflow-docs-staging canceled.
| Name | Link |
|---|---|
| Latest commit | fd86ab41314ce1ec951f8e8cec77fddbe8e863c2 |
| Latest deploy log | https://app.netlify.com/projects/nextflow-docs-staging/deploys/686a33cecd232c0008122e95 |
It is ready for review
I have found an issue with multi-part uploads when uploading large files. I move to draft until I fix it.
In one of the changes that I did to support the Signer override and UserAgent is breaking the multipart upload. I changed the way about how the transfer manager's S3AsyncClient is created and the multipart upload with large files is having issues. It is creating too many parts and it can exhaust the heap memory or a time out when acquaring the connections. It is not happening if we use the default S3CrtAsyncClient but it does not allow to define the UserAgent and Signer as a clientoverride. I am still looking for a way to define them but I am wondering how relevant are these options. Is there a possibility to deprecate them?
@jorgee can you make a quick summary about this effort? any blocking issues?
@pditommaso, this is the summary that I breifly explained to @bentsherman, yesterday.
I have been working on these two issues:
- Custom Signer: Java SDK v2 only support sAWS V4 signer either in netty and CRT. This is also supported by Minio and Ceph. I think a custom signer could be requires if a user uses a non up-to-date version of these tools. In Netty, there is the possibility to pass a custom signer, but there aren't other implementations in the SDK than V4. So the user should provide the implementation. In CRT, there is no possibility to override any HTTP advanced options, so neither Signer or user agent can be specified.
The solution that I have pushed in the latest commits is the following. In the case that a custom signer or user agent is specified, Nextflow will configure a Netty async client of the S3 Transfer Manager. A custom signer can be used including the implementation in the classpath and adding the FQDN of the custom signer class in the signerOverride option.
In this issue, we have an old implementation that could be used in case we need to support V2 Signer.
- The multipart upload problems: They were mainly due to using the old default uploadPartSize of 100MB with async clients. The transfer manager threads are not limited by the executor threads like in the past, they are managed by the async client
maxConcurrencythat is much more larger than threads. So, more parts are managed in parallel creating OOM problems when using small instances such as 1-2GB. With the default async multipart block size (8MB), it is not happening. So, I have kept theuploadPartSizeoption and the otherupload*options for the case that is still using the old programmatic multipart upload (FileSystemProvider.newOutputStreamcase). For the transfers managed by the new S3 transfer manager, I have added new options to configure the async client.
Now, I am running the benchmarks to see if there are differences with the changes, but the code is again ready for review.
What's CRT? 😄
CRT is the AWS Common Runtime. It is a C implementations of the AWS Client. https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/crt-based-s3-client.html
OK, let's wait the result of the benchmark. However my advice is to focus on merging this PR with the core migration ignoring corner cases such Custom signer and user agent problem, and addressing them if needed in a separate PRs.
Regarding the multipart upload problem, is it related with a specific config or a general one?
Note that with the custom signer, if we don't support it in this PR then we have to deprecate it as it would no longer do anything
I have pushed the changes to use only CRT version deprecating signerOverride and userAgent options.
It's also not clear to me how to select the async client vs sync client, as I don't see any explicit config option for this
The sync client is used in all api call except the S3 transfer manager actions ( S3 copies, uploads and downloads). For transfer manager there is no option to use the sync client. I didn't changed all calls to async because it implied a lot of changes in the code
I have added a subsection in the aws part of the documentation mentioning the most important changes in the v1 to v2 migration
End to end tests fail badly. I've got this message
ERROR ~ Type com.amazonaws.services.batch.model.KeyValuePair not present
unfortunately I'm not able to retried the full log
End to end tests fail badly. I've got this message
ERROR ~ Type com.amazonaws.services.batch.model.KeyValuePair not presentunfortunately I'm not able to retried the full log
The package is from v1. I am going to check if there is some place where I forgot someplace where the package is used
I have realized the gradle tasks implementation inside nexflow that still uses v1. I am going to change it, but I think it shouldn't affect end-to-end tests.
Yep, should be unrelated
I've realised some method like AwsBatchTaskHandle.configJobDefRequest were returning a Builder object, that in SDK v2 is an immutable object, making hard for sub-classes (like nf-xpack) to extend the handler behaviour.
To address this, I've introduced a custom class to model the Job definition request https://github.com/nextflow-io/nextflow/blob/fd86ab41314ce1ec951f8e8cec77fddbe8e863c2/plugins/nf-amazon/src/main/nextflow/cloud/aws/batch/model/RegisterJobDefinitionModel.groovy#L34-L34. This allows to better bridge the new builder API model with the existing behaviour and minimise the changes.
Merging this seems com.amazonaws.services.batch.model.KeyValuePair seems related to the xpack that requires to be updated.
@jorgee let's discuss the plan for supporting the option signerOverride if needed.
I saw an integration test into another PR failing with this error
Jul-06 09:44:25.772 [main] DEBUG nextflow.Session - Session aborted -- Cause: The request content has fewer bytes than the specified content-length: 68 bytes.
Jul-06 09:44:25.793 [main] ERROR nextflow.cli.Launcher - @unknown
java.lang.IllegalStateException: The request content has fewer bytes than the specified content-length: 68 bytes.
at software.amazon.awssdk.core.internal.io.SdkLengthAwareInputStream.read(SdkLengthAwareInputStream.java:82)
at software.amazon.awssdk.core.io.SdkFilterInputStream.read(SdkFilterInputStream.java:66)
at software.amazon.awssdk.http.auth.aws.internal.signer.io.ChecksumInputStream.read(ChecksumInputStream.java:52)
at software.amazon.awssdk.http.auth.aws.internal.signer.chunkedencoding.ChunkedEncodedInputStream.read(ChunkedEncodedInputStream.java:136)
at software.amazon.awssdk.http.auth.aws.internal.signer.chunkedencoding.ChunkedEncodedInputStream.getChunk(ChunkedEncodedInputStream.java:112)
at software.amazon.awssdk.http.auth.aws.internal.signer.chunkedencoding.ChunkedEncodedInputStream.currentChunk(ChunkedEncodedInputStream.java:95)
at software.amazon.awssdk.http.auth.aws.internal.signer.chunkedencoding.ChunkedEncodedInputStream.read(ChunkedEncodedInputStream.java:90)
at java.base/java.io.FilterInputStream.read(FilterInputStream.java:132)
at software.amazon.awssdk.core.internal.io.SdkLengthAwareInputStream.read(SdkLengthAwareInputStream.java:75)
at org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:140)
at software.amazon.awssdk.http.apache.internal.RepeatableInputStreamRequestEntity.writeTo(RepeatableInputStreamRequestEntity.java:157)
at org.apache.http.impl.execchain.RequestEntityProxy.writeTo(RequestEntityProxy.java:121)
at org.apache.http.impl.DefaultBHttpClientConnection.sendRequestEntity(DefaultBHttpClientConnection.java:156)
at org.apache.http.impl.conn.CPoolProxy.sendRequestEntity(CPoolProxy.java:152)
at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:238)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
at software.amazon.awssdk.http.apache.internal.impl.ApacheSdkHttpClient.execute(ApacheSdkHttpClient.java:72)
at software.amazon.awssdk.http.apache.ApacheHttpClient.execute(ApacheHttpClient.java:259)
at software.amazon.awssdk.http.apache.ApacheHttpClient.access$600(ApacheHttpClient.java:104)
at software.amazon.awssdk.http.apache.ApacheHttpClient$1.call(ApacheHttpClient.java:236)
at software.amazon.awssdk.http.apache.ApacheHttpClient$1.call(ApacheHttpClient.java:233)
at software.amazon.awssdk.core.internal.util.MetricUtils.measureDurationUnsafe(MetricUtils.java:102)
at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.executeHttpRequest(MakeHttpRequestStage.java:88)
at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.execute(MakeHttpRequestStage.java:64)
at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.execute(MakeHttpRequestStage.java:46)
at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:74)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:43)
at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:79)
at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:41)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:55)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:39)
at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.executeRequest(RetryableStage.java:93)
at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:56)
at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36)
at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:53)
at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:35)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:82)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:62)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:43)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:50)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:32)
at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:210)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:173)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:80)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:182)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:74)
at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:53)
at software.amazon.awssdk.services.s3.DefaultS3Client.putObject(DefaultS3Client.java:11228)
at software.amazon.awssdk.services.s3.DelegatingS3Client.lambda$putObject$86(DelegatingS3Client.java:9145)
at software.amazon.awssdk.services.s3.internal.crossregion.S3CrossRegionSyncClient.invokeOperation(S3CrossRegionSyncClient.java:67)
at software.amazon.awssdk.services.s3.DelegatingS3Client.putObject(DelegatingS3Client.java:9145)
at nextflow.cloud.aws.nio.S3OutputStream.putObject(S3OutputStream.java:626)
at nextflow.cloud.aws.nio.S3OutputStream.putObject(S3OutputStream.java:581)
at nextflow.cloud.aws.nio.S3OutputStream.close(S3OutputStream.java:384)
at java.base/java.io.FilterOutputStream.close(FilterOutputStream.java:188)
at org.codehaus.groovy.vmplugin.v8.IndyInterface.fromCache(IndyInterface.java:321)
at nextflow.extension.FilesEx.closeQuietly(FilesEx.groovy:689)
at nextflow.cache.CloudCacheStore.close(CloudCacheStore.groovy:93)
at nextflow.cache.CacheDB.close(CacheDB.groovy:266)
at nextflow.Session.destroy(Session.groovy:724)
at nextflow.script.ScriptRunner.shutdown(ScriptRunner.groovy:263)
at nextflow.script.ScriptRunner.execute(ScriptRunner.groovy:147)
at nextflow.cli.CmdRun.run(CmdRun.groovy:379)
at nextflow.cli.Launcher.run(Launcher.groovy:513)
at nextflow.cli.Launcher.main(Launcher.groovy:673)
Worth double checking
It could be caused because in v2 content-length is now part of the request instead of metadata. The strange part is that it is not always happening. I am going to try to find when it is reproduced.