aws-toolkit-azure-devops icon indicating copy to clipboard operation
aws-toolkit-azure-devops copied to clipboard

LambdaNETCoreDeploy task fails when run simultaneously multiple time on the same host for different pipelines

Open davidsvit opened this issue 1 year ago • 5 comments

Background: Running the task from an Azure DevOps Pipeline on a Windows Agent Machine.

Inside the LambdaNETCoreDeploy@1 task one of things that gets run is this command:

dotnet tool update -g Amazon.Lambda.Tools

This command can cause the dotnet-lambda.exe install to get messed up if multiple pipelines are running the command at the same time on the same build machine. So this is a difficult bug to recreate. We had a series of scheduled jobs running at the same time that were creating this issue with some regularity . The only way we have found to resolve the issue is to rename the dotnet-lambda.exe executable. To reduce the incidence of the issue we have staggered the times when we run our scheduled jobs that run the LambdaNETCoreDeploy@1 task. This issue will still occur for us though I believe even though it should only be rarely now when manually submitted pipelines happen to be run at the same time on same build machine and get to running the above command at the same time. It would nice to have a resolution to this problem as it is a definite "gotcha" that people have posted about in a number of places. The most helpful one for me in understanding is the issue and that I have borrowed from to report this to the develop team is: https://issuehint.com/issue/aws/aws-toolkit-azure-devops/416.

Here is the typical error message that this issue shows (see below).

Configuring credentials for task ...configuring AWS credentials from service endpoint 'cacbb57d-6030-40b2-85d4-d97cf0163111' ...endpoint defines role-based credentials for role ***. Processing Lambda project at E:\ADOAgent2_work\75\s\src\bsd-ldex-transmission-response-api Reading existing aws-lambda-tools-defaults.json Configuring region for task ...configured to use region us-west-2, defined in task. E:\ADOAgent2_work_tool\dotnet\dotnet.exe tool install -g Amazon.Lambda.Tools Failed to create shell shim for tool 'amazon.lambda.tools ': Command 'dotnet-lambda' conflicts with an existing command from another tool. Tool 'amazon.lambda.tools ' failed to install. E:\ADOAgent2_work_tool\dotnet\dotnet.exe tool update -g Amazon.Lambda.Tools Tool 'amazon.lambda.tools ' failed to update due to the following: Failed to create shell shim for tool 'amazon.lambda.tools ': Command 'dotnet-lambda' conflicts with an existing command from another tool. Tool 'amazon.lambda.tools ' failed to install. ##[error]Unable to install global Amazon.Lambda.Tools ! The old package based version of Amazon.Lambda.Tools is now deprecated. Newer .NET core versions will need to use a newer hosted agent and the global tools (which this task auto installs). Refer to Microsoft's guide for the correct hosted agent for which hosted agent you need to use newer .NET Core versions:https://docs.microsoft.com/en-us/azure/devops/pipelines/agents/hosted E:\ADOAgent2_work_tool\dotnet\dotnet.exe restore Determining projects to restore... All projects are up-to-date for restore. Beginning Serverless Deployment Performing package-only build of serverless application, output template will be placed in E:\ADOAgent2_work\75\a\bsd-ldex-transmission-response-api-template.yaml C:\Users\s.cld.adoagent.dotnet\tools\dotnet-lambda.exe package-ci -ot E:\ADOAgent2_work\75\a\bsd-ldex-transmission-response-api-template.yaml --region us-west-2 --s3-bucket bsd-ldex-transmission-response-deployment-bucket-dev-main --disable-interactive true The application to execute does not exist: 'C:\Users\s.cld.adoagent.dotnet\tools.store\amazon.lambda.tools\5.4.4\amazon.lambda.tools\5.4.4\tools\netcoreapp3.1\any\dotnet-lambda.dll'. ##[error]Error: The process 'C:\Users\s.cld.adoagent.dotnet\tools\dotnet-lambda.exe' failed with exit code 2147516570 Function Name/ARN: arn:aws:lambda:us-west-2:624313876423:function:bsd-ldex-transmission-response-writer-dev-main Mode of deployment: deployment package

Inside the LambdaNETCoreDeploy@1 task one of things that gets run is this command:

dotnet tool update -g Amazon.Lambda.Tools

This causes the install to get messed up if multiple pipelines are running the command at the same time. We had scheduled jobs running at the same time that were creating this issue often. The only way we have found to resolve the issue is to rename the dotnet-lambda.exe executable. To reduce the incidence of the issue we have staggered the times when we run our scheduled jobs that run the LambdaNETCoreDeploy@1 task. This issue will still occur for us though I believe even though it should only be very rarely. It would nice to have a resolution to this problem as it is a definite "gotcha" that people have posted about in a number of places. The most helpful one for me that I have borrowed from to report this to the develop team is: https://issuehint.com/issue/aws/aws-toolkit-azure-devops/416.

Here is the typical error message that this issue shows:

Configuring credentials for task ...configuring AWS credentials from service endpoint 'cacbb57d-6030-40b2-85d4-d97cf0163111' ...endpoint defines role-based credentials for role ***. Processing Lambda project at E:\ADOAgent2_work\75\s\src\bsd-ldex-transmission-response-api Reading existing aws-lambda-tools-defaults.json Configuring region for task ...configured to use region us-west-2, defined in task. E:\ADOAgent2_work_tool\dotnet\dotnet.exe tool install -g Amazon.Lambda.Tools Failed to create shell shim for tool 'amazon.lambda.tools ': Command 'dotnet-lambda' conflicts with an existing command from another tool. Tool 'amazon.lambda.tools ' failed to install. E:\ADOAgent2_work_tool\dotnet\dotnet.exe tool update -g Amazon.Lambda.Tools Tool 'amazon.lambda.tools ' failed to update due to the following: Failed to create shell shim for tool 'amazon.lambda.tools ': Command 'dotnet-lambda' conflicts with an existing command from another tool. Tool 'amazon.lambda.tools ' failed to install. ##[error]Unable to install global Amazon.Lambda.Tools ! The old package based version of Amazon.Lambda.Tools is now deprecated. Newer .NET core versions will need to use a newer hosted agent and the global tools (which this task auto installs). Refer to Microsoft's guide for the correct hosted agent for which hosted agent you need to use newer .NET Core versions:https://docs.microsoft.com/en-us/azure/devops/pipelines/agents/hosted E:\ADOAgent2_work_tool\dotnet\dotnet.exe restore Determining projects to restore... All projects are up-to-date for restore. Beginning Serverless Deployment Performing package-only build of serverless application, output template will be placed in E:\ADOAgent2_work\75\a\bsd-ldex-transmission-response-api-template.yaml C:\Users\s.cld.adoagent.dotnet\tools\dotnet-lambda.exe package-ci -ot E:\ADOAgent2_work\75\a\bsd-ldex-transmission-response-api-template.yaml --region us-west-2 --s3-bucket bsd-ldex-transmission-response-deployment-bucket-dev-main --disable-interactive true The application to execute does not exist: 'C:\Users\s.cld.adoagent.dotnet\tools.store\amazon.lambda.tools\5.4.4\amazon.lambda.tools\5.4.4\tools\netcoreapp3.1\any\dotnet-lambda.dll'. ##[error]Error: The process 'C:\Users\s.cld.adoagent.dotnet\tools\dotnet-lambda.exe' failed with exit code 2147516570 Function Name/ARN: arn:aws:lambda:us-west-2:624313876423:function:bsd-ldex-transmission-response-writer-dev-main Mode of deployment: deployment package

davidsvit avatar Aug 01 '22 23:08 davidsvit

Hey @davidsvit

I've looked into this and I believe it's because dotnet tool update always re-installs the tool, leading to race conditions. I can think of a few solutions for the Toolkit to implement:

  1. Toolkit checks if the tool is already installed/updated prior to attempting to install/update
    • Not great because we have to parse dotnet tool output
  2. Add optional flag to skip automatic installs/updates as mentioned here. Users will be responsible for installing/updating Amazon.Lambda.Tools.
  3. Install in a temp directory associated with each task instead of globally

I'm leaning toward option 2 simply because it involves less moving parts. Would something like this work for you?

JadenSimon avatar Aug 02 '22 17:08 JadenSimon

Hi Jaden –

I don’t think option 2 actually helps that much because it just moves the issue up the line. In other words, then as users of the task we have to figure out if the global tools are installed/updated in our pipelines or If we don’t do this and just install/update it before we run the task then we are at risk for creating the same race conditions with multiple pipelines running on the same machine.

I would just like to be able to run the Task and have it work without any fuss or extra workarounds. Option 3 seems like it might be the best to be sure there isn’t an issue with multiple pipeline runs going on at the same time on the same machine. This seems like the best option to truly fix the issue from my perspective. The toolkit can get what it needs for each run without depending on a global install or the issues of updates to a global install happening at the same time.

Thanks!

-David.

From: JadenSimon @.> Sent: Tuesday, August 2, 2022 10:24 AM To: aws/aws-toolkit-azure-devops @.> Cc: David Svitavsky @.>; Mention @.> Subject: Re: [aws/aws-toolkit-azure-devops] LambdaNETCoreDeploy task fails when run simultaneously multiple time on the same host for different pipelines (Issue #481)

This message originated from outside Symetra. Be cautious of unfamiliar links and attachments.

Hey @davidsvithttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdavidsvit&data=05%7C01%7Cdavid.svitavsky%40symetra.com%7C945375461070420cb65d08da74abd491%7C65b361c72cab4cd6b61f1a71705724e8%7C0%7C0%7C637950578613214880%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=m9EW5%2FDcUHPyvdrZkgxTyRA01JUDMk%2BPSyPN84GBYAo%3D&reserved=0

I've looked into this and I believe it's because dotnet tool update always re-installs the tool, leading to race conditions. I can think of a few solutions for the Toolkit to implement:

  1. Toolkit checks if the tool is already installed/updated prior to attempting to install/update
 *   Not great because we have to parse dotnet tool output
  1. Add optional flag to skip automatic installs/updates as mentioned herehttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissuehint.com%2Fissue%2Faws%2Faws-toolkit-azure-devops%2F416&data=05%7C01%7Cdavid.svitavsky%40symetra.com%7C945375461070420cb65d08da74abd491%7C65b361c72cab4cd6b61f1a71705724e8%7C0%7C0%7C637950578613214880%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=KyxFqJl740R07ob7fldto4pHO9iLIFYAnsqi5GN5gY8%3D&reserved=0. Users will be responsible for installing/updating Amazon.Lambda.Tools.
  2. Install in a temp directory associated with each task instead of globally

I'm leaning toward option 2 simply because it involves less moving parts. Would something like this work for you?

— Reply to this email directly, view it on GitHubhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Faws%2Faws-toolkit-azure-devops%2Fissues%2F481%23issuecomment-1203013145&data=05%7C01%7Cdavid.svitavsky%40symetra.com%7C945375461070420cb65d08da74abd491%7C65b361c72cab4cd6b61f1a71705724e8%7C0%7C0%7C637950578613214880%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=XWkNUoh5yTx9OS5HSdO3wlSf3eYRRlQ4a%2FxFSfwtF7Y%3D&reserved=0, or unsubscribehttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FA2KQ6LTK3DNF7HF6WX5R27TVXFKUDANCNFSM55JH2H4Q&data=05%7C01%7Cdavid.svitavsky%40symetra.com%7C945375461070420cb65d08da74abd491%7C65b361c72cab4cd6b61f1a71705724e8%7C0%7C0%7C637950578613214880%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=yJbzOZ4%2FAjJg7QPgY8D6K3o0wNNAEYLw6m9uF2k%2FUUk%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.@.>>

davidsvit avatar Aug 02 '22 18:08 davidsvit

I would suggest option 3 as well. That should be pretty easy using the --tool-path option on the dotnet tool install command.

normj avatar Aug 03 '22 07:08 normj

I would just like to be able to run the Task and have it work without any fuss or extra workarounds.

Understandable. I agree that users shouldn't have to worry about all this stuff, I just want to make sure that whatever the Toolkit implements truly addresses the problem and doesn't end up causing more frustration later by being too opaque.

I'll see about implementing option 3.

JadenSimon avatar Aug 03 '22 16:08 JadenSimon

Hi @JadenSimon any updates on this issue! It has come up for us again. Basically if we run a lot of pipeline jobs at the same time that do a build and/or package a build machine's Global Amazon Tools install gets messed up and subsequent runs on the build machine fail. It would be really helpful to get this issue resolved!

davidsvit avatar May 23 '23 18:05 davidsvit