aws-sam-cli
aws-sam-cli copied to clipboard
Running sam build from a container gets stuck
Description:
Hello. I'm trying to Dockerize a tool (essentially a wrapper script) that utilizes aws-sam-cli. I'm unable to successfully execute sam build from inside of my "wrapper" container as it gets stuck at this point:
Fetching public.ecr.aws/sam/build-python3.9:latest-x86_64 Docker container image......
2021-12-02 15:19:33,064 | Mounting /source/my-application as /tmp/samcli/source:ro,delegated inside runtime container
From that point, the build doesn't move forward (I tried to wait for over an hour). I've also tried different versions of aws-sam-cli but with the same result. Running other containers inside of my container setup works as expected.
Steps to reproduce:
- Build a Docker image with
aws-sam-cli==1.36.0installed - Run build inside of a container
docker run -it --rm -v /source/my-application:/my-application -v /var/run/docker.sock:/var/run/docker.sock wrapper build
wrapper build executes the following
sam build \
--region ${AWS_REGION_CODE} \
--profile ${AWS_PROFILE} \
--template "$function_src_dir"/template.yaml \
--build-dir "${AWS_SAM_BUILD_DIR}" \
--use-container --parallel --debug
The Docker image is based on ubuntu:20.04 with python3 installed.
Observed result:
2021-12-02 15:19:31,364 | Building codeuri: /source/my-application runtime: python3.9 metadata: {} architecture: x86_64 functions: ['MyFunction']
2021-12-02 15:19:31,365 | Building to following folder /root/aws-sam-generated-artifacts/123456789012/ireland/my-application/MyFunction
2021-12-02 15:19:31,365 | Waiting for async results
Fetching public.ecr.aws/sam/build-python3.9:latest-x86_64 Docker container image......
2021-12-02 15:19:33,064 | Mounting /source/my-application as /tmp/samcli/source:ro,delegated inside runtime container
From this point on, the build gets stuck.
Expected result:
sam build should work when run from inside of a container
Additional environment details (Ex: Windows, Mac, Amazon Linux etc)
- OS: MacOS 12
sam --version: 1.36.0- AWS region:
eu-west-1
So sorry you encountered an issue! Thanks for reporting it, we will be investigating it further.
Seeing similar issues when running a sam build inside a local codeBuild container via
codebuild_build.sh -c -b ./buildspec-sam.yml -i public.ecr.aws/codebuild/amazonlinux2-x86_64-standard:3.0 -a ../out/
The buildspec-sam.yml just calls -
build: commands: - echo Build started on $(date) in $(pwd) - cd sam - sam build - sam deploy --config-env $BUILD_ENV --no-confirm-changeset --no-fail-on-empty-changeset
this is on a MacOS. Works fine on non-docker host OS.
Hello. Any updates on this? @ssenchenko
Hi there,
Any updates on this? Really good to have this as a capability for AWS SAM .
Thank you
Hey, any updates on this one?
I am using the generated .github/workflows/pipeline.yaml from sam pipeline init .
Thanks.
~~After upgrading to aws-sam-cli==1.51.0, everything works as expected~~
I'm re-opening the issue as the problem still persists. The reason why my previous build worked was that the CF stack didn't contain any Lambda function. After trying to run sam build for a stack which contains a Lambda function from a container, I experienced the same problem - the build gets stuck.
2022-06-13 12:46:48,791 | Async execution started
2022-06-13 12:46:48,791 | Invoking function functools.partial(<bound method DefaultBuildStrategy.build_single_function_definition of <samcli.lib.build.build_strategy.DefaultBuildStrategy object at 0x7fdf18178d00>>, <samcli.lib.build.build_graph.FunctionBuildDefinition object at 0x7fdf18178b20>)
2022-06-13 12:46:48,792 | Building codeuri: /src/my_application runtime: python3.7 metadata: {} architecture: x86_64 functions: ['ClzSecHubIncidentsCloseFunction']
2022-06-13 12:46:48,792 | Building to following folder /root/aws-sam-generated-artifacts/633965134003/ireland/my_application/MyApplicationFunction
2022-06-13 12:46:48,792 | Waiting for async results
Fetching public.ecr.aws/sam/build-python3.7:latest-x86_64 Docker container image..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
2022-06-13 12:47:40,626 | Mounting /src/my_application as /tmp/samcli/source:ro,delegated inside runtime container
From this point, the build doesn't progress. I also tried to upgrade the runtime to python3.9 but that didn't work either. My current sam version is aws-sam-cli==1.51.0.
Can we get any update on this, please?
I did some investigation and it looks like this gets hung when the docker engine is unable to create the container from the container environment for various reasons. It appears sam gets hung at this call: https://github.com/aws/aws-sam-cli/blob/89bf1099400f7c4b5b1e7cd642bc998ecca7f8a7/samcli/local/docker/container.py#L284
Which is entrapped in this try:/finally statement with no exception capturing here: https://github.com/aws/aws-sam-cli/blob/89bf1099400f7c4b5b1e7cd642bc998ecca7f8a7/samcli/lib/build/app_builder.py#L844
I suspect that because docker couldn't create the container it hangs when attempting to stop. After some debugging on an environment that reliably had this error I could see that during the create step the docker engine returns a status code 500 and then the docker engine runs through a series of other apis before it hangs completely. There was no error code in the other calls after the 500 is returned. I was unable to track down where in sam or docker where these new apis are being called after the create fails.
So the initial issue appears to be caused by a bug when running with the python-docker library in certain environments when run from a container and then there appears to be a bug in how sam handles this particular type of failure.
I wasn't able to confirm this but most of the issues that report this problem are run from either a windows or a macbook. I haven't seen a linux docker in docker fail so there could be something in the way the socket is shared from the host to the container. Part of the reason I suspect this issue is os related is this error didn't occur in a docker in docker environment built on linux but it did happen on my macbook.
For anyone else that runs into this issue try setting up the docker in docker environment on a linux host instead of windows or macos.
@alacy-alteryx thanks for the follow-up. I was able to reproduce the same error on Linux (Redhat and Ubuntu) VMs.
I am seeing the same thing with SAM CLI=1.76.0 in my WSL2 (linux ubuntu 22 on Windows) environment
Fetching public.ecr.aws/sam/build-python3.9:latest-x86_64 Docker container image......
Mounting /home/brad/dev/rds/CsRdsAutoStartStop/lambda/rds_start_week_end.py as /tmp/samcli/source:ro,delegated inside runtime container
[hangs at this step forever]
If I open Docker Desktop, I can see that the container is created, but not started. If I try to start it, I get "Failed to start 1 item" in Docker Desktop. The sam build command immediately returns
2023-03-16 11:33:25,082 | Builder crashed:
2023-03-16 11:33:25,793 | Exception raised during the execution
2023-03-16 11:33:25,797 | Telemetry endpoint configured to be https://aws-serverless-tools-telemetry.us-west-2.amazonaws.com/metrics
2023-03-16 11:33:25,804 | Telemetry endpoint configured to be https://aws-serverless-tools-telemetry.us-west-2.amazonaws.com/metrics
2023-03-16 11:33:25,804 | Sending Telemetry: {'metrics': [{'commandRun': {'requestId': '4027fdfe-934b-4fef-827f-71ef9578f13d', 'installationId': '13b475ca-9dcc-4c98-8c7c-eaeff20129c1', 'sessionId': 'acb13c8f-e787-4e2e-8026-f15633617b82', 'executionEnvironment': 'CLI', 'ci': False, 'pyversion': '3.7.10', 'samcliVersion': '1.76.0', 'awsProfileProvided': False, 'debugFlagProvided': True, 'region': '', 'commandName': 'sam build', 'metricSpecificAttributes': {'projectType': 'CFN', 'gitOrigin': None, 'projectName': '0f0a1aefe0e208082d599fd79707ae5e807d6f98c9e8a53cd65e88338c1737b1', 'initialCommit': None}, 'duration': 311488, 'exitReason': 'JSONDecodeError', 'exitCode': 255}}]}
2023-03-16 11:33:25,804 | Unable to find Click Context for getting session_id.
2023-03-16 11:33:25,806 | Sending Telemetry: {'metrics': [{'events': {'requestId': '693aaad1-98f1-4148-9ac9-1391f6a4f140', 'installationId': '13b475ca-9dcc-4c98-8c7c-eaeff20129c1', 'sessionId': 'acb13c8f-e787-4e2e-8026-f15633617b82', 'executionEnvironment': 'CLI', 'ci': False, 'pyversion': '3.7.10', 'samcliVersion': '1.76.0', 'metricSpecificAttributes': {'events': [{'event_name': 'BuildFunctionRuntime', 'event_value': 'python3.9', 'thread_id': 140397368438784, 'time_stamp': '2023-03-16 15:28:14.528'}, {'event_name': 'BuildFunctionRuntime', 'event_value': 'python3.9', 'thread_id': 140397368438784, 'time_stamp': '2023-03-16 15:28:14.528'}, {'event_name': 'BuildFunctionRuntime', 'event_value': 'python3.9', 'thread_id': 140397368438784, 'time_stamp': '2023-03-16 15:28:14.528'}, {'event_name': 'BuildFunctionRuntime', 'event_value': 'python3.9', 'thread_id': 140397368438784, 'time_stamp': '2023-03-16 15:28:14.528'}, {'event_name': 'BuildFunctionRuntime', 'event_value': 'python3.9', 'thread_id': 140397368438784, 'time_stamp': '2023-03-16 15:28:14.528'}, {'event_name': 'BuildFunctionRuntime', 'event_value': 'python3.9', 'thread_id': 140397368438784, 'time_stamp': '2023-03-16 15:28:14.528'}, {'event_name': 'BuildWorkflowUsed', 'event_value': 'python-pip', 'thread_id': 140397368438784, 'time_stamp': '2023-03-16 15:28:14.543'}, {'event_name': 'BuildWorkflowUsed', 'event_value': 'python-pip', 'thread_id': 140397368438784, 'time_stamp': '2023-03-16 15:28:14.545'}, {'event_name': 'BuildWorkflowUsed', 'event_value': 'python-pip', 'thread_id': 140397368438784, 'time_stamp': '2023-03-16 15:28:14.575'}, {'event_name': 'BuildWorkflowUsed', 'event_value': 'python-pip', 'thread_id': 140397368438784, 'time_stamp': '2023-03-16 15:28:14.579'}, {'event_name': 'BuildWorkflowUsed', 'event_value': 'python-pip', 'thread_id': 140397368438784, 'time_stamp': '2023-03-16 15:28:14.584'}, {'event_name': 'BuildWorkflowUsed', 'event_value': 'python-pip', 'thread_id': 140397368438784, 'time_stamp': '2023-03-16 15:28:14.584'}]}}}]}
2023-03-16 11:33:26,134 | Telemetry response: 200
2023-03-16 11:33:26,145 | Telemetry response: 200
Error: Expecting value: line 1 column 1 (char 0)
Traceback:
File "click/core.py", line 1055, in main
File "click/core.py", line 1657, in invoke
File "click/core.py", line 1404, in invoke
File "click/core.py", line 760, in invoke
File "click/decorators.py", line 84, in new_func
File "click/core.py", line 760, in invoke
File "samcli/lib/telemetry/metric.py", line 183, in wrapped
File "samcli/lib/telemetry/metric.py", line 148, in wrapped
File "samcli/lib/utils/version_checker.py", line 42, in wrapped
File "samcli/cli/main.py", line 92, in wrapper
File "samcli/commands/build/command.py", line 207, in cli
File "samcli/commands/build/command.py", line 279, in do_cli
File "samcli/commands/build/build_context.py", line 263, in run
File "samcli/lib/build/app_builder.py", line 214, in build
File "samcli/lib/build/build_strategy.py", line 393, in build
File "samcli/lib/build/build_strategy.py", line 80, in build
File "samcli/lib/build/build_strategy.py", line 400, in _build_functions
File "samcli/lib/build/build_strategy.py", line 415, in _run_builds_async
File "samcli/lib/utils/async_utils.py", line 131, in run_async
File "samcli/lib/utils/async_utils.py", line 90, in run_given_tasks_async
File "asyncio/base_events.py", line 587, in run_until_complete
File "samcli/lib/utils/async_utils.py", line 58, in _run_given_tasks_async
File "concurrent/futures/thread.py", line 57, in run
File "samcli/lib/build/build_strategy.py", line 426, in build_single_function_definition
File "samcli/lib/build/build_strategy.py", line 572, in build_single_function_definition
File "samcli/lib/build/build_strategy.py", line 281, in build_single_function_definition
File "samcli/lib/build/build_strategy.py", line 174, in build_single_function_definition
File "samcli/lib/build/app_builder.py", line 695, in _build_function
File "samcli/lib/build/app_builder.py", line 945, in _build_function_on_container
File "samcli/lib/build/app_builder.py", line 963, in _parse_builder_response
File "json/__init__.py", line 348, in loads
File "json/decoder.py", line 337, in decode
File "json/decoder.py", line 355, in raw_decode
An unexpected error was encountered while executing "sam build".
Search for an existing issue:
https://github.com/aws/aws-sam-cli/issues?q=is%3Aissue+is%3Aopen+Bug%3A%20sam%20build%20-%20JSONDecodeError
Or create a bug report:
https://github.com/aws/aws-sam-cli/issues/new?template=Bug_report.md&title=Bug%3A%20sam%20build%20-%20JSONDecodeError
I am not sure if this is useful information or just noise.
As another angle: I have a github action that tries to do a build at GitHub using their ubuntu-latest image. It also hangs forever at sam build --use-container
sam build --use-container is working now for me. The root cause for me was the CodeUri property value in the function definition in template.yaml:
My folder structure is as follows
/template.yaml
/lambda/auto_start_rds_instance.py
In template.yaml, the "failing" definition was as follows:
Resources
AutoStartRDS:
Type: AWS::Serverless::Function
Properties:
CodeUri: ./lambda/auto_start_rds_instance.py
Handler: auto_start_rds_instance.lambda_handler
Runtime: python3.9
...
correcting the CodeUri
Resources
AutoStartRDS:
Type: AWS::Serverless::Function
Properties:
CodeUri: lambda
Handler: auto_start_rds_instance.lambda_handler
Runtime: python3.9
...
The odd thing is that I could do a manual sam deploy to my AWS account with the "failing" definition and it worked just fine. I suspect sam build may use the CodeUri property a different way.
I've run into this to, on an arm linux machine where SAM tries to use an amd64 image for a sam build --use-container.
The problem is with docker-py's Container.attach, which was reported ~5 years ago, but without any solution so far. The reporter of the issue says using Container.logs works around the problem, however that API can only return stdout and stderr mashed together, which isn't a drop-in fix for the way SAM works.
Hey there! Any update on this issue? I'm also facing the same situation.
I experienced this when using arm64 Node.js runtimes. Switching back to x86_64 solved the issue.
If you are using CodeBuild, make sure that you are using a new Image type, something like aws/codebuild/amazonlinux2-x86_64-standard:5.0. On this newer version, I had no problems building arm64.
Experiencing the same problems while trying to build python3.11 lambdas with arm64 architecture.
I got the build to work using the following docker image: https://github.com/aws/aws-sam-cli/issues/3331#issuecomment-932353379. That docker container is a little scary to use, would be better if SAM could somehow by built cross platform on e.g. GitHub without the need for that hack.
And even though the build seems to pass I'm still getting runtime dependencies issues afterwards like e.g. in a cryptography dependency:
Unable to import module 'app': /opt/python/cryptography/hazmat/bindings/_rust.abi3.so: cannot open shared object file: No such file or directory
Hi,
Is there any update on this? It would be great to have this capability for AWS SAM.
Is there any update on this? I am facing the same issue on Ubuntu 22.04
I am still getting the same issue with the latest version of SAM CLI. Is there any update on this?
Hi there,
There are couple different use cases here, please provide your specific example below or create a new issue so that we can check them separately.
For most common use case (or issue), in order to execute any cross arch docker image on Linux, have to run multiarch/qemu-user-static single time for each run in order to enable emulation. See; https://github.com/multiarch/qemu-user-static. This is due to limitation of Docker Linux and this is not applicable for MacOS or Windows instances.
I've created following Github repository for this example: https://github.com/mndeveci/example-arm-lambda-build
If I just use sam build -u without the step above, the build process is stuck since host can't execute cross arch instructions. See example build; https://github.com/mndeveci/example-arm-lambda-build/actions/runs/7792694386/job/21251164713
But if I add docker run --rm --privileged multiarch/qemu-user-static --reset -p yes just before the build step here, then my build succeeds. See example build; https://github.com/mndeveci/example-arm-lambda-build/actions/runs/7792717799/job/21251231671
@mndeveci The build works fine with the workaround with the gemu-user-static docker image, although I'm still experiencing the same issues as in: https://github.com/aws/aws-sam-cli/issues/3512#issuecomment-1697529569 during runtime. So even though it solves the stuck issue here, it's maybe not an optimal and working solution for all cases?
@michal-sa is there a simple reproducible example for us to investigate further?
@mndeveci I started to create an example with a fork from your example repository but it worked fine, then I realized my previous tests were failing not due to the lambda itself but becuase I also had a dependency to a layer where I forgot to set the BuildArchitecture to arm64 as stated in: https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/building-layers.html.
So sorry for this, my bad. Looks like the fix with the docker image works very well :clap:
@michal-sa glad to hear it is working now 🥳
I am going to close this issue now, please create new issue(s) if you are experiencing problems regarding to this area.
⚠️COMMENT VISIBILITY WARNING⚠️
Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.