aws-sam-cli
aws-sam-cli copied to clipboard
Bug: High init duration when deployed through SAM.
Description:
When deployed through sam the init duration is very high(5s) but when we deploy manually it takes less than 1s. Functionally its working fine in sam deployment. Only issue is the high init duration. We changed only layers in sam with the layer we had in manual deployment then it was working as expected(low init duration).
Steps to reproduce:
- Deploy an AWS Lambda function using AWS SAM.
- Measure the initialization duration of the Lambda function.
- Deploy the same Lambda function manually using the AWS CLI.
- Measure the initialization duration of the manually deployed Lambda function.
- Compare the initialization durations between the two deployment methods.
sam deploy commands: 1 sam build --use-container -t ./template.yaml 2 sam deploy --config-env prod
with and without docker(--use-container) we tried We have 3 more custom util py files in layer.
template.yaml
Transform: AWS::Serverless-2016-10-31
Description: stack
Parameters:
environment:
Type: String
Default: dev
Description: The environment to deploy to
AllowedValues:
- dev
- prod
ConstraintDescription: must specify dev or prod
Globals:
Function:
Environment:
Variables:
environment: !Sub ${environment}
region_name: !Ref AWS::Region
EphemeralStorage:
Size: 512
Timeout: 5
Handler: lambda_function.lambda_handler
Runtime: python3.11
Architectures:
- arm64
Layers:
- !Ref LayerBase
MemorySize: 256
Resources:
LayerBase:
Type: AWS::Serverless::LayerVersion
Properties:
LayerName: -base
Description: base layer
ContentUri: './utils'
CompatibleRuntimes:
- python3.11
CompatibleArchitectures:
- arm64
Metadata:
PackageType: Zip
BuildMethod: python3.11
BuildArchitecture: arm64
# BuildPlatform: Amazon Linux 2
CustomPolicyForSecrets:
Type: AWS::IAM::Policy
Properties:
PolicyName: CustomPolicyForSecrets
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- secretsmanager:GetSecretValue
Resource: '*'
Roles:
- !Ref BasicLambdaRole
- !Ref PriviligedLambdaRole
LogPolicyForLambda:
Type: AWS::IAM::Policy
Properties:
PolicyName: LogPolicyForLambda
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- logs:CreateLogGroup
- logs:CreateLogStream
- logs:PutLogEvents
Resource: !Sub arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/${AWS::StackName}-*
Roles:
- !Ref BasicLambdaRole
- !Ref PriviligedLambdaRole
BasicLambdaRole:
Type: AWS::IAM::Role
Description: Role to Read secret managers
# refer LogPolicyForLambda to this role
Properties:
RoleName: MyEmptyRole
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: lambda.amazonaws.com
Action: sts:AssumeRole
PriviligedLambdaRole:
Type: AWS::IAM::Role
Description: Role for SQS access + secretsmanager access
Properties:
Policies:
- PolicyName: CustomPolicyForSQS
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- sqs:*
Resource: '*'
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: lambda.amazonaws.com
Action: sts:AssumeRole
PostChats:
Type: AWS::Serverless::Function
Properties:
CodeUri: ./post__chats
Description: "Main function which handles user's chats with AI"
MemorySize: 512
Timeout: 90
PackageType: Zip
Handler: lambda_function.lambda_handler
Runtime: python3.11
EphemeralStorage:
Size: 512
EventInvokeConfig:
MaximumEventAgeInSeconds: 21600
MaximumRetryAttempts: 2
AutoPublishAlias: live
DeploymentPreference:
Type: AllAtOnce
ProvisionedConcurrencyConfig:
ProvisionedConcurrentExecutions: 2
Role: !GetAtt PriviligedLambdaRole.Arn
Events:
PostApiEvent:
Type: Api
Properties:
Path: /chats-sam/post
Method: POST
FetchChats:
Type: AWS::Serverless::Function
Properties:
CodeUri: ./fetch__chats
Description: "Function to handle fetch chats with AI"
EventInvokeConfig:
MaximumEventAgeInSeconds: 21600
MaximumRetryAttempts: 2
Role: !GetAtt BasicLambdaRole.Arn
Events:
FetchApiEvent:
Type: Api
Properties:
Path: /chats-sam/fetch
Method: POST
Timeout: 10
PackageType: Zip
We tried both arm and x86 architectures and different python versions.
requirementes.txt
numpy==1.24.4
openai==1.3.8
pgvector==0.2.4
psycopg==3.1.6
psycopg-binary==3.1.6
requests==2.31.0
pytz
phonenumbers
Observed result:
Test Event Name test
Response { "statusCode": 200, "status": "success", "headers": { "Access-Control-Allow-Origin": "*" }, "messages": "[]" }
Function Logs START RequestId: 6ae5c841-833f-4d0a-abd1-dcf4c19ec834 Version: $LATEST [INFO] 2024-02-20T06:51:59.375Z 6ae5c841-833f-4d0a-abd1-dcf4c19ec834 Received event: { "user_id": "9fcb1fec0d522dcfcfaeb64da8d30b4a", "organization_id": 11 } END RequestId: 6ae5c841-833f-4d0a-abd1-dcf4c19ec832 REPORT RequestId: 6ae5c841-833f-4d0a-abd1-dcf4c19ec834 Duration: 15.53 ms Billed Duration: 16 ms Memory Size: 256 MB Max Memory Used: 256 MB Init Duration: 4970.89 ms
Request ID 6ae5c841-833f-4d0a-abd1-dcf4c19ec832
Expected result:
Init Duration should be less than 1000ms
Additional environment details (Ex: Windows, Mac, Amazon Linux etc)
- OS: tried Ubuntu, Mac
-
sam --version
: SAM CLI, version 1.109.0 - AWS region: US-east-2
{
"version": "1.109.0",
"system": {
"python": "3.11.3",
"os": "Linux-6.5.0-18-generic-x86_64-with-glibc2.35"
},
"additional_dependencies": {
"docker_engine": "Not available",
"aws_cdk": "Not available",
"terraform": "Not available"
},
"available_beta_feature_env_vars": [
"SAM_CLI_BETA_FEATURES",
"SAM_CLI_BETA_BUILD_PERFORMANCE",
"SAM_CLI_BETA_TERRAFORM_SUPPORT",
"SAM_CLI_BETA_RUST_CARGO_LAMBDA"
]
}
Hi, I was not able to reproduce your slow cold start with a project deployed via SAM CLI:
SAM CLI
-
sam init
a Python 3.11 hello world template - Modified the template to add the following resource:
HelloWorldLayer:
Type: AWS::Serverless::LayerVersion
Properties:
ContentUri: ./layer
CompatibleRuntimes:
- python3.11
CompatibleArchitectures:
- x86_64
Metadata:
BuildMethod: python3.11
The only thing that is inside of ./layer
is a requirements.txt
file with the same dependencies noted in the original issue.
- Modified the hello world function to add the following new properties:
EphemeralStorage:
Size: 512
Layers:
- !Ref HelloWorldLayer
- Update the handler to import a package that is defined from the layer (just import, not using any methods)
- Ran
sam build -u
- Ran
sam deploy
- Ran
sam remote invoke
$ sam remote invoke --stack-name sam-app-6715-2
Invoking Lambda Function HelloWorldFunction
START RequestId: ee5bfe82-1b21-4c20-af40-fc9eaea477c3 Version: $LATEST
END RequestId: ee5bfe82-1b21-4c20-af40-fc9eaea477c3
REPORT RequestId: ee5bfe82-1b21-4c20-af40-fc9eaea477c3 Duration: 1.43 ms Billed Duration: 2 ms Memory Size: 128 MB Max Memory Used: 76 MB Init Duration: 679.37 ms
{"statusCode": 200, "body": "{\"message\": \"hello world\"}"}%
AWS CLI
When creating a Lambda manually via the AWS CLI:
- Zipped the function and layer built directory from previous
sam build
- Create a Lambda layer:
aws lambda publish-layer-version --layer-name manual-layer3 --zip-file fileb://hellolayer.zip --compatible-runtimes python3.11 --compatible-architectures x86_64
- Create a Lambda function:
aws lambda create-function --function-name manual3 --role <some role that is able to `InvokeFunction`> --layers <layer ARN from previous command> --zip-file fileb://hellofunction.zip --handler app.lambda_handler --runtime python3.11 --ephemeral-storage Size=512
- Invoke the Lambda function
$ aws lambda invoke --function-name manual3 /dev/stdout
{"statusCode": 200, "body": "{\"message\": \"hello world\"}"}%
Inspecting the Cloudwatch logs for the function, we can see that the init time is roughly the same as one the one deployed with SAM CLI:
REPORT RequestId: 44daa145-cf6f-4ccb-a86c-379107756e76 Duration: 1.54 ms Billed Duration: 2 ms Memory Size: 128 MB Max Memory Used: 76 MB Init Duration: 665.80 ms
Is there a reproducible project that you can share somewhere where we can reproduce this? Something interesting to point out, is that in your function runtime logs it looks like the memory is being maxed out. This may have something to do with the slow init times?
Found the issue. It was because of same module importing twice in a python file which was in the layer.
import phonenumbers
from phonenumbers import geocoder, carrier, timezone
removed second line. Then init duration during warm up reduced from 5s to 0.5s. Also Max Memory Used gone down from 250MB to 70MB.
Hi @AmruthjithSunil, glad you found the issue. Does that address your concerns when running sam commands now?
Closing as the issue seems to be resolved.
⚠️COMMENT VISIBILITY WARNING⚠️
Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.