aws-lambda-dotnet
aws-lambda-dotnet copied to clipboard
Initial Request Performance Regression on Amazon Linux 2023
Describe the bug
When using the AWS SDK for .NET on Lambda with a runtime of "provided.al2023" the first request to an AWS service is significantly slower than it is under "provided.al2".
I've checked this with the Step Functions and S3 SDKs in particular.
Expected Behavior
Let's look under Amazon Linux 2. Here's an example initial request metric from the S3 SDK:
Request metrics: AsyncCall = True; CanonicalRequest = [REDACTED]; StringToSign = AWS4-HMAC-SHA256\n20231116T162723Z\n20231116/eu-west-1/s3/aws4_request\n0eadbea89d6ac50207072872ff834e0c56ecb2d325b4b8e74b20322a13e955a1; ServiceName = AmazonS3; ServiceEndpoint = https://fen-object-share.s3.eu-west-1.amazonaws.com/; MethodName = ListObjectsV2Request; AmzId2 = j9+NJQJ+0MGzhi2RWbHJowscyJVqh8kd7rJ8xidbA2CZWfQdnzEyhI880oYmqMF0MudxZdEi/Yc=; StatusCode = OK; BytesProcessed = 0; AWSRequestID = QN3QHSD7PNK16JE2; CredentialsRequestTime = 00:00:00.0000659; RequestSigningTime = 00:00:00.0141405; HttpRequestTime = 00:00:00.1191039; ResponseUnmarshallTime = 00:00:00.0022337; ResponseProcessingTime = 00:00:00.0024265; ClientExecuteTime = 00:00:00.1375685;
…and from a subsequent request:
Request metrics: AsyncCall = True; CanonicalRequest = [REDACTED]; StringToSign = AWS4-HMAC-SHA256\n20231116T162729Z\n20231116/eu-west-1/s3/aws4_request\nf8f6569b2501b42700322a3f2cc6e9d30d4676388cd0911f13ff2534568aa87a; ServiceName = AmazonS3; ServiceEndpoint = https://fen-object-share.s3.eu-west-1.amazonaws.com/; MethodName = ListObjectsV2Request; AmzId2 = c6DRj8nFKTIYU91M/iPPrOK2RB3ZoXpm/u1/2ZvBLnhU5g1MK0xrrUYvsmx1kj0rh9RiuBku0uA=; StatusCode = OK; BytesProcessed = 0; AWSRequestID = K2VKBKNNP5G1244J; CredentialsRequestTime = 00:00:00.0000823; RequestSigningTime = 00:00:00.0001613; HttpRequestTime = 00:00:00.0562662; ResponseUnmarshallTime = 00:00:00.0012703; ResponseProcessingTime = 00:00:00.0014287; ClientExecuteTime = 00:00:00.0581422;
Let's clean that up:
| Initial | Subsequent | |
|---|---|---|
| CredentialsRequestTime | 00:00:00.0000659 | 00:00:00.0000823 |
| RequestSigningTime | 00:00:00.0141405 | 00:00:00.0001613 |
| HttpRequestTime | 00:00:00.1191039 | 00:00:00.0562662 |
| ResponseUnmarshallTime | 00:00:00.0022337 | 00:00:00.0012703 |
| ResponseProcessingTime | 00:00:00.0024265 | 00:00:00.0014287 |
| ClientExecuteTime | 00:00:00.1375685 | 00:00:00.0581422 |
In short, the total time for an initial request is a little under 140ms with subsequent requests under 60ms. Around 120 of those 140 milliseconds were spent in HttpRequestTime. A totally reasonable place to spend milliseconds.
Current Behavior
Doing the same for Amazon Linux 2023, first:
Request metrics: AsyncCall = True; CanonicalRequest = [REDACTED]; StringToSign = AWS4-HMAC-SHA256\n20231116T163811Z\n20231116/eu-west-1/s3/aws4_request\n80c4f0ccf97f8a6a98732829089b00568838935cb9a820541ad204d27df447a9; ServiceName = AmazonS3; ServiceEndpoint = https://fen-object-share.s3.eu-west-1.amazonaws.com/; MethodName = ListObjectsV2Request; AmzId2 = V5p/kJVqud9IIy9+KZLbK/AqdjfvxSp6f4mREQiEooUHtcpqTxkNfCyG+VI2y1bH3gK22JUCcmk=; StatusCode = OK; BytesProcessed = 0; AWSRequestID = 20552BMSN97GSH3T; CredentialsRequestTime = 00:00:00.0000497; RequestSigningTime = 00:00:00.0152391; HttpRequestTime = 00:00:00.3099302; ResponseUnmarshallTime = 00:00:00.0054981; ResponseProcessingTime = 00:00:00.0056516; ClientExecuteTime = 00:00:00.3326032;
…and subsequent:
Request metrics: AsyncCall = True; CanonicalRequest = [REDACTED]; StringToSign = AWS4-HMAC-SHA256\n20231116T163819Z\n20231116/eu-west-1/s3/aws4_request\nf3ceeb3c74e79254fc97210de4475dae052f7cf7a2564f4456f7000aefc60786; ServiceName = AmazonS3; ServiceEndpoint = https://fen-object-share.s3.eu-west-1.amazonaws.com/; MethodName = ListObjectsV2Request; AmzId2 = VnPpOSZCVArdTrl5tOhRHZAEDl3IYCbSUpB8ddHuDklLENmeCUvrj/sAYh6F7qVrv5fT4eJmUlM=; StatusCode = OK; BytesProcessed = 0; AWSRequestID = CT6NQGKCR4DCFGR8; CredentialsRequestTime = 00:00:00.0000801; RequestSigningTime = 00:00:00.0002212; HttpRequestTime = 00:00:00.0450372; ResponseUnmarshallTime = 00:00:00.0016921; ResponseProcessingTime = 00:00:00.0018422; ClientExecuteTime = 00:00:00.0474034;
Cleaning that up as before, we get:
| Initial | Subsequent | |
|---|---|---|
| CredentialsRequestTime | 00:00:00.0000497 | 00:00:00.0000801 |
| RequestSigningTime | 00:00:00.0152391 | 00:00:00.0002212 |
| HttpRequestTime | 00:00:00.3099302 | 00:00:00.0450372 |
| ResponseUnmarshallTime | 00:00:00.0054981 | 00:00:00.0016921 |
| ResponseProcessingTime | 00:00:00.0056516 | 00:00:00.0018422 |
| ClientExecuteTime | 00:00:00.3326032 | 00:00:00.0474034 |
The overall ClientExecuteTime for the initial request has more than doubled. Around 310 of the total 330 milliseconds were spent in HttpRequestTime. This and the above are only a single sample – I've sampled this many times and I first saw it manifest in a running application:
These are stats for duration in milliseconds of first request binned by 1 day. Before and after the bump are "provided.al2". The bump is "provided.al2023".
Reproduction Steps
Repro is committed here. Publishing with CI set to true will perform AOT compilation. Please do let me know if there's more I can do.
Possible Solution
No response
Additional Information/Context
No response
AWS .NET SDK and/or Package version used
AWSSDK.S3: 3.7.300.3 AWSSDK.StepFunctions: 3.7.301.1
Targeted .NET Platform
.NET 8
Operating System and version
Amazon Linux 2, Amazon Linux 2023
Thanks for all of this data @chrisoverzero. I'll do some investigation to see what I can find.
This issue was most likely due to the the Amazon Linux 2023 using OpenSSL 3.0. We found a performance regression in OpenSSL 3.0 and .NET which caused the cert store to loaded twice. For the dotnet8 runtime we made a special tweak to compensate for this issue. We also pushed a fix to the .NET runtime but Microsoft won't ship that fix till .NET 9. Since the provided.al2023 does not use the bash script I linked before it will be affected by the regression. I would advice using the dotnet8 runtime over the provided.al2023 runtime even if you aren't using the .NET runtime installed in the managed runtime.
Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.
@normj Thanks for all the work that you and others put into this issue – I was following along on other GitHub projects.
Does the equivalent advice apply to Image-style Lambda Functions, as well? That is, even for a NativeAOT application, would one be better off using the dotnet8 base image rather than the provided.al2023 image?
You can see the tweak we ended up doing was just set the SSL_CERT_FILE to non existing file outside of the usual cert directory to avoid cert double load. So you could set that environment variable yourself on provided.al2023. Something to keep in mind with provided.al2023 is libicu is not preinstalled which .NET needs for internalization. So unless you deploy with your function configured for invariant mode you have to include libicu in the deployment bundle. You can include in .NET by adding the following to your project but it does increase the deployment bundle quite a bit.
<ItemGroup>
<RuntimeHostConfigurationOption Include="System.Globalization.AppLocalIcu" Value="68.2.0.9" />
<PackageReference Include="Microsoft.ICU.ICU4C.Runtime" Version="68.2.0.9" />
</ItemGroup>
The lack of libicu is another strong reason I recommend deploying dotnet8 to avoid having include it.