aws-sam-cli icon indicating copy to clipboard operation
aws-sam-cli copied to clipboard

Running sam build from a container gets stuck

Open imduchy opened this issue 3 years ago • 19 comments
trafficstars

Description:

Hello. I'm trying to Dockerize a tool (essentially a wrapper script) that utilizes aws-sam-cli. I'm unable to successfully execute sam build from inside of my "wrapper" container as it gets stuck at this point:

Fetching public.ecr.aws/sam/build-python3.9:latest-x86_64 Docker container image......
2021-12-02 15:19:33,064 | Mounting /source/my-application as /tmp/samcli/source:ro,delegated inside runtime container

From that point, the build doesn't move forward (I tried to wait for over an hour). I've also tried different versions of aws-sam-cli but with the same result. Running other containers inside of my container setup works as expected.

Steps to reproduce:

  1. Build a Docker image with aws-sam-cli==1.36.0 installed
  2. Run build inside of a container docker run -it --rm -v /source/my-application:/my-application -v /var/run/docker.sock:/var/run/docker.sock wrapper build

wrapper build executes the following

sam build \
    --region ${AWS_REGION_CODE} \
    --profile ${AWS_PROFILE} \
    --template "$function_src_dir"/template.yaml \
    --build-dir "${AWS_SAM_BUILD_DIR}" \
    --use-container --parallel --debug

The Docker image is based on ubuntu:20.04 with python3 installed.

Observed result:

2021-12-02 15:19:31,364 | Building codeuri: /source/my-application runtime: python3.9 metadata: {} architecture: x86_64 functions: ['MyFunction']
2021-12-02 15:19:31,365 | Building to following folder /root/aws-sam-generated-artifacts/123456789012/ireland/my-application/MyFunction
2021-12-02 15:19:31,365 | Waiting for async results

Fetching public.ecr.aws/sam/build-python3.9:latest-x86_64 Docker container image......
2021-12-02 15:19:33,064 | Mounting /source/my-application as /tmp/samcli/source:ro,delegated inside runtime container

From this point on, the build gets stuck.

Expected result:

sam build should work when run from inside of a container

Additional environment details (Ex: Windows, Mac, Amazon Linux etc)

  1. OS: MacOS 12
  2. sam --version: 1.36.0
  3. AWS region: eu-west-1

imduchy avatar Dec 02 '21 15:12 imduchy

So sorry you encountered an issue! Thanks for reporting it, we will be investigating it further.

ssenchenko avatar Dec 09 '21 19:12 ssenchenko

Seeing similar issues when running a sam build inside a local codeBuild container via

codebuild_build.sh -c -b ./buildspec-sam.yml -i public.ecr.aws/codebuild/amazonlinux2-x86_64-standard:3.0 -a ../out/

The buildspec-sam.yml just calls -

build: commands: - echo Build started on $(date) in $(pwd) - cd sam - sam build - sam deploy --config-env $BUILD_ENV --no-confirm-changeset --no-fail-on-empty-changeset

this is on a MacOS. Works fine on non-docker host OS.

oliverahul avatar Jan 15 '22 23:01 oliverahul

Hello. Any updates on this? @ssenchenko

imduchy avatar Feb 25 '22 08:02 imduchy

Hi there,

Any updates on this? Really good to have this as a capability for AWS SAM .

Thank you

piersf avatar Apr 07 '22 13:04 piersf

Hey, any updates on this one?

I am using the generated .github/workflows/pipeline.yaml from sam pipeline init .

image

Thanks.

hillbillydev avatar Apr 26 '22 02:04 hillbillydev

~~After upgrading to aws-sam-cli==1.51.0, everything works as expected~~

imduchy avatar Jun 05 '22 21:06 imduchy

I'm re-opening the issue as the problem still persists. The reason why my previous build worked was that the CF stack didn't contain any Lambda function. After trying to run sam build for a stack which contains a Lambda function from a container, I experienced the same problem - the build gets stuck.

2022-06-13 12:46:48,791 | Async execution started
2022-06-13 12:46:48,791 | Invoking function functools.partial(<bound method DefaultBuildStrategy.build_single_function_definition of <samcli.lib.build.build_strategy.DefaultBuildStrategy object at 0x7fdf18178d00>>, <samcli.lib.build.build_graph.FunctionBuildDefinition object at 0x7fdf18178b20>)
2022-06-13 12:46:48,792 | Building codeuri: /src/my_application runtime: python3.7 metadata: {} architecture: x86_64 functions: ['ClzSecHubIncidentsCloseFunction']
2022-06-13 12:46:48,792 | Building to following folder /root/aws-sam-generated-artifacts/633965134003/ireland/my_application/MyApplicationFunction
2022-06-13 12:46:48,792 | Waiting for async results

Fetching public.ecr.aws/sam/build-python3.7:latest-x86_64 Docker container image..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
2022-06-13 12:47:40,626 | Mounting /src/my_application as /tmp/samcli/source:ro,delegated inside runtime container

From this point, the build doesn't progress. I also tried to upgrade the runtime to python3.9 but that didn't work either. My current sam version is aws-sam-cli==1.51.0.

Can we get any update on this, please?

imduchy avatar Jun 13 '22 12:06 imduchy

I did some investigation and it looks like this gets hung when the docker engine is unable to create the container from the container environment for various reasons. It appears sam gets hung at this call: https://github.com/aws/aws-sam-cli/blob/89bf1099400f7c4b5b1e7cd642bc998ecca7f8a7/samcli/local/docker/container.py#L284

Which is entrapped in this try:/finally statement with no exception capturing here: https://github.com/aws/aws-sam-cli/blob/89bf1099400f7c4b5b1e7cd642bc998ecca7f8a7/samcli/lib/build/app_builder.py#L844

I suspect that because docker couldn't create the container it hangs when attempting to stop. After some debugging on an environment that reliably had this error I could see that during the create step the docker engine returns a status code 500 and then the docker engine runs through a series of other apis before it hangs completely. There was no error code in the other calls after the 500 is returned. I was unable to track down where in sam or docker where these new apis are being called after the create fails.

So the initial issue appears to be caused by a bug when running with the python-docker library in certain environments when run from a container and then there appears to be a bug in how sam handles this particular type of failure.

I wasn't able to confirm this but most of the issues that report this problem are run from either a windows or a macbook. I haven't seen a linux docker in docker fail so there could be something in the way the socket is shared from the host to the container. Part of the reason I suspect this issue is os related is this error didn't occur in a docker in docker environment built on linux but it did happen on my macbook.

For anyone else that runs into this issue try setting up the docker in docker environment on a linux host instead of windows or macos.

alacy-alteryx avatar Oct 12 '22 01:10 alacy-alteryx

@alacy-alteryx thanks for the follow-up. I was able to reproduce the same error on Linux (Redhat and Ubuntu) VMs.

imduchy avatar Dec 14 '22 12:12 imduchy

I am seeing the same thing with SAM CLI=1.76.0 in my WSL2 (linux ubuntu 22 on Windows) environment

Fetching public.ecr.aws/sam/build-python3.9:latest-x86_64 Docker container image......
Mounting /home/brad/dev/rds/CsRdsAutoStartStop/lambda/rds_start_week_end.py as /tmp/samcli/source:ro,delegated inside runtime container
[hangs at this step forever]

If I open Docker Desktop, I can see that the container is created, but not started. If I try to start it, I get "Failed to start 1 item" in Docker Desktop. The sam build command immediately returns

2023-03-16 11:33:25,082 | Builder crashed:
2023-03-16 11:33:25,793 | Exception raised during the execution
2023-03-16 11:33:25,797 | Telemetry endpoint configured to be https://aws-serverless-tools-telemetry.us-west-2.amazonaws.com/metrics
2023-03-16 11:33:25,804 | Telemetry endpoint configured to be https://aws-serverless-tools-telemetry.us-west-2.amazonaws.com/metrics
2023-03-16 11:33:25,804 | Sending Telemetry: {'metrics': [{'commandRun': {'requestId': '4027fdfe-934b-4fef-827f-71ef9578f13d', 'installationId': '13b475ca-9dcc-4c98-8c7c-eaeff20129c1', 'sessionId': 'acb13c8f-e787-4e2e-8026-f15633617b82', 'executionEnvironment': 'CLI', 'ci': False, 'pyversion': '3.7.10', 'samcliVersion': '1.76.0', 'awsProfileProvided': False, 'debugFlagProvided': True, 'region': '', 'commandName': 'sam build', 'metricSpecificAttributes': {'projectType': 'CFN', 'gitOrigin': None, 'projectName': '0f0a1aefe0e208082d599fd79707ae5e807d6f98c9e8a53cd65e88338c1737b1', 'initialCommit': None}, 'duration': 311488, 'exitReason': 'JSONDecodeError', 'exitCode': 255}}]}
2023-03-16 11:33:25,804 | Unable to find Click Context for getting session_id.
2023-03-16 11:33:25,806 | Sending Telemetry: {'metrics': [{'events': {'requestId': '693aaad1-98f1-4148-9ac9-1391f6a4f140', 'installationId': '13b475ca-9dcc-4c98-8c7c-eaeff20129c1', 'sessionId': 'acb13c8f-e787-4e2e-8026-f15633617b82', 'executionEnvironment': 'CLI', 'ci': False, 'pyversion': '3.7.10', 'samcliVersion': '1.76.0', 'metricSpecificAttributes': {'events': [{'event_name': 'BuildFunctionRuntime', 'event_value': 'python3.9', 'thread_id': 140397368438784, 'time_stamp': '2023-03-16 15:28:14.528'}, {'event_name': 'BuildFunctionRuntime', 'event_value': 'python3.9', 'thread_id': 140397368438784, 'time_stamp': '2023-03-16 15:28:14.528'}, {'event_name': 'BuildFunctionRuntime', 'event_value': 'python3.9', 'thread_id': 140397368438784, 'time_stamp': '2023-03-16 15:28:14.528'}, {'event_name': 'BuildFunctionRuntime', 'event_value': 'python3.9', 'thread_id': 140397368438784, 'time_stamp': '2023-03-16 15:28:14.528'}, {'event_name': 'BuildFunctionRuntime', 'event_value': 'python3.9', 'thread_id': 140397368438784, 'time_stamp': '2023-03-16 15:28:14.528'}, {'event_name': 'BuildFunctionRuntime', 'event_value': 'python3.9', 'thread_id': 140397368438784, 'time_stamp': '2023-03-16 15:28:14.528'}, {'event_name': 'BuildWorkflowUsed', 'event_value': 'python-pip', 'thread_id': 140397368438784, 'time_stamp': '2023-03-16 15:28:14.543'}, {'event_name': 'BuildWorkflowUsed', 'event_value': 'python-pip', 'thread_id': 140397368438784, 'time_stamp': '2023-03-16 15:28:14.545'}, {'event_name': 'BuildWorkflowUsed', 'event_value': 'python-pip', 'thread_id': 140397368438784, 'time_stamp': '2023-03-16 15:28:14.575'}, {'event_name': 'BuildWorkflowUsed', 'event_value': 'python-pip', 'thread_id': 140397368438784, 'time_stamp': '2023-03-16 15:28:14.579'}, {'event_name': 'BuildWorkflowUsed', 'event_value': 'python-pip', 'thread_id': 140397368438784, 'time_stamp': '2023-03-16 15:28:14.584'}, {'event_name': 'BuildWorkflowUsed', 'event_value': 'python-pip', 'thread_id': 140397368438784, 'time_stamp': '2023-03-16 15:28:14.584'}]}}}]}
2023-03-16 11:33:26,134 | Telemetry response: 200
2023-03-16 11:33:26,145 | Telemetry response: 200

Error: Expecting value: line 1 column 1 (char 0)
Traceback:
  File "click/core.py", line 1055, in main
  File "click/core.py", line 1657, in invoke
  File "click/core.py", line 1404, in invoke
  File "click/core.py", line 760, in invoke
  File "click/decorators.py", line 84, in new_func
  File "click/core.py", line 760, in invoke
  File "samcli/lib/telemetry/metric.py", line 183, in wrapped
  File "samcli/lib/telemetry/metric.py", line 148, in wrapped
  File "samcli/lib/utils/version_checker.py", line 42, in wrapped
  File "samcli/cli/main.py", line 92, in wrapper
  File "samcli/commands/build/command.py", line 207, in cli
  File "samcli/commands/build/command.py", line 279, in do_cli
  File "samcli/commands/build/build_context.py", line 263, in run
  File "samcli/lib/build/app_builder.py", line 214, in build
  File "samcli/lib/build/build_strategy.py", line 393, in build
  File "samcli/lib/build/build_strategy.py", line 80, in build
  File "samcli/lib/build/build_strategy.py", line 400, in _build_functions
  File "samcli/lib/build/build_strategy.py", line 415, in _run_builds_async
  File "samcli/lib/utils/async_utils.py", line 131, in run_async
  File "samcli/lib/utils/async_utils.py", line 90, in run_given_tasks_async
  File "asyncio/base_events.py", line 587, in run_until_complete
  File "samcli/lib/utils/async_utils.py", line 58, in _run_given_tasks_async
  File "concurrent/futures/thread.py", line 57, in run
  File "samcli/lib/build/build_strategy.py", line 426, in build_single_function_definition
  File "samcli/lib/build/build_strategy.py", line 572, in build_single_function_definition
  File "samcli/lib/build/build_strategy.py", line 281, in build_single_function_definition
  File "samcli/lib/build/build_strategy.py", line 174, in build_single_function_definition
  File "samcli/lib/build/app_builder.py", line 695, in _build_function
  File "samcli/lib/build/app_builder.py", line 945, in _build_function_on_container
  File "samcli/lib/build/app_builder.py", line 963, in _parse_builder_response
  File "json/__init__.py", line 348, in loads
  File "json/decoder.py", line 337, in decode
  File "json/decoder.py", line 355, in raw_decode

An unexpected error was encountered while executing "sam build".
Search for an existing issue:
https://github.com/aws/aws-sam-cli/issues?q=is%3Aissue+is%3Aopen+Bug%3A%20sam%20build%20-%20JSONDecodeError
Or create a bug report:
https://github.com/aws/aws-sam-cli/issues/new?template=Bug_report.md&title=Bug%3A%20sam%20build%20-%20JSONDecodeError

I am not sure if this is useful information or just noise.

As another angle: I have a github action that tries to do a build at GitHub using their ubuntu-latest image. It also hangs forever at sam build --use-container

bradthurber avatar Mar 16 '23 15:03 bradthurber

sam build --use-container is working now for me. The root cause for me was the CodeUri property value in the function definition in template.yaml:

My folder structure is as follows

/template.yaml
/lambda/auto_start_rds_instance.py

In template.yaml, the "failing" definition was as follows:

Resources
  AutoStartRDS:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: ./lambda/auto_start_rds_instance.py
      Handler: auto_start_rds_instance.lambda_handler
      Runtime: python3.9
      ...

correcting the CodeUri

Resources
  AutoStartRDS:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: lambda
      Handler: auto_start_rds_instance.lambda_handler
      Runtime: python3.9
      ...

The odd thing is that I could do a manual sam deploy to my AWS account with the "failing" definition and it worked just fine. I suspect sam build may use the CodeUri property a different way.

bradthurber avatar Mar 16 '23 17:03 bradthurber

I've run into this to, on an arm linux machine where SAM tries to use an amd64 image for a sam build --use-container.

The problem is with docker-py's Container.attach, which was reported ~5 years ago, but without any solution so far. The reporter of the issue says using Container.logs works around the problem, however that API can only return stdout and stderr mashed together, which isn't a drop-in fix for the way SAM works.

wash-amzn avatar May 03 '23 21:05 wash-amzn

Hey there! Any update on this issue? I'm also facing the same situation.

Nishit-Dalwadi avatar May 16 '23 13:05 Nishit-Dalwadi

I experienced this when using arm64 Node.js runtimes. Switching back to x86_64 solved the issue.

If you are using CodeBuild, make sure that you are using a new Image type, something like aws/codebuild/amazonlinux2-x86_64-standard:5.0. On this newer version, I had no problems building arm64.

ivancsicsmarkus avatar Jun 06 '23 15:06 ivancsicsmarkus

Experiencing the same problems while trying to build python3.11 lambdas with arm64 architecture.

I got the build to work using the following docker image: https://github.com/aws/aws-sam-cli/issues/3331#issuecomment-932353379. That docker container is a little scary to use, would be better if SAM could somehow by built cross platform on e.g. GitHub without the need for that hack.

And even though the build seems to pass I'm still getting runtime dependencies issues afterwards like e.g. in a cryptography dependency:

Unable to import module 'app': /opt/python/cryptography/hazmat/bindings/_rust.abi3.so: cannot open shared object file: No such file or directory

michal-sa avatar Aug 29 '23 14:08 michal-sa

Hi,

Is there any update on this? It would be great to have this capability for AWS SAM.

Nosredzzz avatar Sep 13 '23 01:09 Nosredzzz

Is there any update on this? I am facing the same issue on Ubuntu 22.04

psambit9791 avatar Oct 29 '23 20:10 psambit9791

I am still getting the same issue with the latest version of SAM CLI. Is there any update on this?

GeovannyLopez avatar Jan 17 '24 18:01 GeovannyLopez

Hi there,

There are couple different use cases here, please provide your specific example below or create a new issue so that we can check them separately.

For most common use case (or issue), in order to execute any cross arch docker image on Linux, have to run multiarch/qemu-user-static single time for each run in order to enable emulation. See; https://github.com/multiarch/qemu-user-static. This is due to limitation of Docker Linux and this is not applicable for MacOS or Windows instances.

I've created following Github repository for this example: https://github.com/mndeveci/example-arm-lambda-build

If I just use sam build -u without the step above, the build process is stuck since host can't execute cross arch instructions. See example build; https://github.com/mndeveci/example-arm-lambda-build/actions/runs/7792694386/job/21251164713

But if I add docker run --rm --privileged multiarch/qemu-user-static --reset -p yes just before the build step here, then my build succeeds. See example build; https://github.com/mndeveci/example-arm-lambda-build/actions/runs/7792717799/job/21251231671

mndeveci avatar Feb 06 '24 00:02 mndeveci

@mndeveci The build works fine with the workaround with the gemu-user-static docker image, although I'm still experiencing the same issues as in: https://github.com/aws/aws-sam-cli/issues/3512#issuecomment-1697529569 during runtime. So even though it solves the stuck issue here, it's maybe not an optimal and working solution for all cases?

michal-sa avatar Mar 19 '24 14:03 michal-sa

@michal-sa is there a simple reproducible example for us to investigate further?

mndeveci avatar Mar 19 '24 19:03 mndeveci

@mndeveci I started to create an example with a fork from your example repository but it worked fine, then I realized my previous tests were failing not due to the lambda itself but becuase I also had a dependency to a layer where I forgot to set the BuildArchitecture to arm64 as stated in: https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/building-layers.html.

So sorry for this, my bad. Looks like the fix with the docker image works very well :clap:

michal-sa avatar Mar 28 '24 16:03 michal-sa

@michal-sa glad to hear it is working now 🥳

I am going to close this issue now, please create new issue(s) if you are experiencing problems regarding to this area.

mndeveci avatar Mar 28 '24 17:03 mndeveci

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.

github-actions[bot] avatar Mar 28 '24 17:03 github-actions[bot]