aws-sam-cli icon indicating copy to clipboard operation
aws-sam-cli copied to clipboard

Bug: High init duration when deployed through SAM.

Open AmruthjithSunil opened this issue 1 year ago • 1 comments

Description:

When deployed through sam the init duration is very high(5s) but when we deploy manually it takes less than 1s. Functionally its working fine in sam deployment. Only issue is the high init duration. We changed only layers in sam with the layer we had in manual deployment then it was working as expected(low init duration).

Steps to reproduce:

  • Deploy an AWS Lambda function using AWS SAM.
  • Measure the initialization duration of the Lambda function.
  • Deploy the same Lambda function manually using the AWS CLI.
  • Measure the initialization duration of the manually deployed Lambda function.
  • Compare the initialization durations between the two deployment methods.

sam deploy commands: 1 sam build --use-container -t ./template.yaml 2 sam deploy --config-env prod

with and without docker(--use-container) we tried We have 3 more custom util py files in layer.

template.yaml

Transform: AWS::Serverless-2016-10-31
Description:  stack

Parameters:
  environment:
    Type: String
    Default: dev
    Description: The environment to deploy to
    AllowedValues:
      - dev
      - prod
    ConstraintDescription: must specify dev or prod

Globals:
  Function:
    Environment:
      Variables:
        environment: !Sub ${environment}
        region_name: !Ref AWS::Region
    EphemeralStorage:
      Size: 512
    Timeout: 5
    Handler: lambda_function.lambda_handler
    Runtime: python3.11
    Architectures:
      - arm64
    Layers:
      - !Ref LayerBase
    MemorySize: 256

Resources:

  LayerBase:
    Type: AWS::Serverless::LayerVersion
    Properties:
      LayerName: -base
      Description:  base layer
      ContentUri: './utils'
      CompatibleRuntimes:
        - python3.11
      CompatibleArchitectures:
        - arm64
    Metadata:
      PackageType: Zip
      BuildMethod: python3.11
      BuildArchitecture: arm64
    #  BuildPlatform: Amazon Linux 2
  
  
  CustomPolicyForSecrets:
    Type: AWS::IAM::Policy
    Properties:
      PolicyName: CustomPolicyForSecrets
      PolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Action:
              - secretsmanager:GetSecretValue
            Resource: '*'
      Roles:
        - !Ref BasicLambdaRole
        - !Ref PriviligedLambdaRole
    
  LogPolicyForLambda:
    Type: AWS::IAM::Policy
    Properties:
      PolicyName: LogPolicyForLambda
      PolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Action:
              - logs:CreateLogGroup
              - logs:CreateLogStream
              - logs:PutLogEvents
            Resource: !Sub arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/${AWS::StackName}-*
      Roles:
        - !Ref BasicLambdaRole
        - !Ref PriviligedLambdaRole
  
  BasicLambdaRole:
    Type: AWS::IAM::Role
    Description: Role to Read secret managers
    # refer LogPolicyForLambda to this role
    Properties:
      RoleName: MyEmptyRole
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
            Action: sts:AssumeRole

  PriviligedLambdaRole:
    Type: AWS::IAM::Role
    Description: Role for SQS access + secretsmanager access
    Properties:
      Policies:
        - PolicyName: CustomPolicyForSQS
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - sqs:*
                Resource: '*'
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
            Action: sts:AssumeRole

    
  PostChats:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: ./post__chats
      Description: "Main function which handles user's chats with  AI"
      MemorySize: 512
      Timeout: 90
      PackageType: Zip
      Handler: lambda_function.lambda_handler
      Runtime: python3.11
      EphemeralStorage:
        Size: 512
      EventInvokeConfig:
        MaximumEventAgeInSeconds: 21600
        MaximumRetryAttempts: 2
      AutoPublishAlias: live
      DeploymentPreference:
        Type: AllAtOnce
      ProvisionedConcurrencyConfig:
        ProvisionedConcurrentExecutions: 2
      Role: !GetAtt PriviligedLambdaRole.Arn
      Events:
        PostApiEvent:
          Type: Api
          Properties:
            Path: /chats-sam/post
            Method: POST

  FetchChats:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: ./fetch__chats
      Description: "Function to handle fetch chats with  AI"
      EventInvokeConfig:
        MaximumEventAgeInSeconds: 21600
        MaximumRetryAttempts: 2
      Role: !GetAtt BasicLambdaRole.Arn
      Events:
        FetchApiEvent:
          Type: Api
          Properties:
            Path: /chats-sam/fetch
            Method: POST
      Timeout: 10
      PackageType: Zip

We tried both arm and x86 architectures and different python versions.

requirementes.txt

numpy==1.24.4
openai==1.3.8
pgvector==0.2.4
psycopg==3.1.6
psycopg-binary==3.1.6
requests==2.31.0
pytz
phonenumbers

Observed result:

Test Event Name test

Response { "statusCode": 200, "status": "success", "headers": { "Access-Control-Allow-Origin": "*" }, "messages": "[]" }

Function Logs START RequestId: 6ae5c841-833f-4d0a-abd1-dcf4c19ec834 Version: $LATEST [INFO] 2024-02-20T06:51:59.375Z 6ae5c841-833f-4d0a-abd1-dcf4c19ec834 Received event: { "user_id": "9fcb1fec0d522dcfcfaeb64da8d30b4a", "organization_id": 11 } END RequestId: 6ae5c841-833f-4d0a-abd1-dcf4c19ec832 REPORT RequestId: 6ae5c841-833f-4d0a-abd1-dcf4c19ec834 Duration: 15.53 ms Billed Duration: 16 ms Memory Size: 256 MB Max Memory Used: 256 MB Init Duration: 4970.89 ms

Request ID 6ae5c841-833f-4d0a-abd1-dcf4c19ec832

Expected result:

Init Duration should be less than 1000ms

Additional environment details (Ex: Windows, Mac, Amazon Linux etc)

  1. OS: tried Ubuntu, Mac
  2. sam --version: SAM CLI, version 1.109.0
  3. AWS region: US-east-2
{
  "version": "1.109.0",
  "system": {
    "python": "3.11.3",
    "os": "Linux-6.5.0-18-generic-x86_64-with-glibc2.35"
  },
  "additional_dependencies": {
    "docker_engine": "Not available",
    "aws_cdk": "Not available",
    "terraform": "Not available"
  },
  "available_beta_feature_env_vars": [
    "SAM_CLI_BETA_FEATURES",
    "SAM_CLI_BETA_BUILD_PERFORMANCE",
    "SAM_CLI_BETA_TERRAFORM_SUPPORT",
    "SAM_CLI_BETA_RUST_CARGO_LAMBDA"
  ]
}

AmruthjithSunil avatar Feb 20 '24 07:02 AmruthjithSunil

Hi, I was not able to reproduce your slow cold start with a project deployed via SAM CLI:

SAM CLI

  1. sam init a Python 3.11 hello world template
  2. Modified the template to add the following resource:
  HelloWorldLayer:
    Type: AWS::Serverless::LayerVersion
    Properties:
      ContentUri: ./layer
      CompatibleRuntimes:
        - python3.11
      CompatibleArchitectures:
        - x86_64
    Metadata:
      BuildMethod: python3.11

The only thing that is inside of ./layer is a requirements.txt file with the same dependencies noted in the original issue.

  1. Modified the hello world function to add the following new properties:
EphemeralStorage:
  Size: 512
Layers:
  - !Ref HelloWorldLayer
  1. Update the handler to import a package that is defined from the layer (just import, not using any methods)
  2. Ran sam build -u
  3. Ran sam deploy
  4. Ran sam remote invoke
$ sam remote invoke --stack-name sam-app-6715-2
Invoking Lambda Function HelloWorldFunction                                                                                                                         
START RequestId: ee5bfe82-1b21-4c20-af40-fc9eaea477c3 Version: $LATEST
END RequestId: ee5bfe82-1b21-4c20-af40-fc9eaea477c3
REPORT RequestId: ee5bfe82-1b21-4c20-af40-fc9eaea477c3  Duration: 1.43 ms       Billed Duration: 2 ms   Memory Size: 128 MB     Max Memory Used: 76 MB  Init Duration: 679.37 ms
{"statusCode": 200, "body": "{\"message\": \"hello world\"}"}%

AWS CLI

When creating a Lambda manually via the AWS CLI:

  1. Zipped the function and layer built directory from previous sam build
  2. Create a Lambda layer:
aws lambda publish-layer-version --layer-name manual-layer3 --zip-file fileb://hellolayer.zip --compatible-runtimes python3.11 --compatible-architectures x86_64
  1. Create a Lambda function:
aws lambda create-function --function-name manual3 --role <some role that is able to `InvokeFunction`> --layers <layer ARN from previous command> --zip-file fileb://hellofunction.zip --handler app.lambda_handler --runtime python3.11 --ephemeral-storage Size=512
  1. Invoke the Lambda function
$ aws lambda invoke --function-name manual3 /dev/stdout       
{"statusCode": 200, "body": "{\"message\": \"hello world\"}"}%

Inspecting the Cloudwatch logs for the function, we can see that the init time is roughly the same as one the one deployed with SAM CLI:

REPORT RequestId: 44daa145-cf6f-4ccb-a86c-379107756e76	Duration: 1.54 ms	Billed Duration: 2 ms	Memory Size: 128 MB	Max Memory Used: 76 MB	Init Duration: 665.80 ms

Is there a reproducible project that you can share somewhere where we can reproduce this? Something interesting to point out, is that in your function runtime logs it looks like the memory is being maxed out. This may have something to do with the slow init times?

lucashuy avatar Feb 21 '24 21:02 lucashuy

Found the issue. It was because of same module importing twice in a python file which was in the layer.

import phonenumbers
from phonenumbers import geocoder, carrier, timezone

removed second line. Then init duration during warm up reduced from 5s to 0.5s. Also Max Memory Used gone down from 250MB to 70MB.

AmruthjithSunil avatar Mar 05 '24 09:03 AmruthjithSunil

Hi @AmruthjithSunil, glad you found the issue. Does that address your concerns when running sam commands now?

hnnasit avatar Mar 06 '24 17:03 hnnasit

Closing as the issue seems to be resolved.

mildaniel avatar Mar 21 '24 00:03 mildaniel

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.

github-actions[bot] avatar Mar 21 '24 00:03 github-actions[bot]