aspire icon indicating copy to clipboard operation
aspire copied to clipboard

Fix: aspire deploy exits with code 0 when deployment fails

Open Copilot opened this issue 2 months ago • 2 comments

Description

aspire deploy was exiting with code 0 even when deployments failed, causing CI/CD pipelines to incorrectly report success.

The root cause: a backchannel disconnect handler in DotNetCliRunner unconditionally called Environment.Exit(0), bypassing normal exit code logic in PipelineCommandBase.

Flow:

  1. AppHost reports deployment failure via backchannel
  2. CLI processes failure, determines correct exit code (6)
  3. CLI requests AppHost stop
  4. Backchannel disconnects → handler calls Environment.Exit(0)
  5. CLI exits with 0 instead of 6

Fix: Removed the disconnect handler. CLI now completes normally and returns the appropriate exit code.

Fixes #(issue)

Checklist

  • Is this feature complete?
    • [x] Yes. Ready to ship.
    • [ ] No. Follow-up changes expected.
  • Are you including unit tests for the changes and scenario tests if relevant?
    • [x] Yes
    • [ ] No
  • Did you add public API?
    • [ ] Yes
      • If yes, did you have an API Review for it?
        • [ ] Yes
        • [ ] No
      • Did you add <remarks /> and <code /> elements on your triple slash comments?
        • [ ] Yes
        • [ ] No
    • [x] No
  • Does the change make any security assumptions or guarantees?
    • [ ] Yes
      • If yes, have you done a threat model and had a security review?
        • [ ] Yes
        • [ ] No
    • [x] No
  • Does the change require an update in our Aspire docs?
Original prompt

This section details on the original issue you should resolve

<issue_title>Failed Step 3 of "aspire deploy" returns exit code 0 even if failed.</issue_title> <issue_description>### Is there an existing issue for this?

  • [x] I have searched the existing issues

Describe the bug

I am in the process of migrating from azd up to aspire deploy in a CI/CD pipeline to deploy to azure.

For some errors that I had aspire deploy failed so my pipeline (GitLab) failed as well.

Now I saw my pipeline succeeded but the app was not updated so I looked at the logs. aspire deploy failed during step 3 but the exit code was 0

Expected Behavior

I would exepect a non zero exit code so my pipeline fails.

Steps To Reproduce

I don't know how to reproduce this because I don't know the root cause yet, but this is my (stripped down) build step

deploy:
  stage: staging
  image: mcr.microsoft.com/dotnet/sdk:9.0-noble
  before_script:
    # install az
    - curl -sL https://aka.ms/InstallAzureCLIDeb | bash
    - az version
    # install aspire-cli
    - curl -sSL https://aspire.dev/install.sh | bash
    - ~/.aspire/bin/aspire --version
  script:
    - az login --service-principal --username "$AZURE_CLIENT_ID" --password "$AZURE_CLIENT_SECRET" --tenant "$AZURE_TENANT_ID"
    # deploy
    - ~/.aspire/bin/aspire deploy
    - echo "aspire deploy exitcode:$?"

and this is the (relevant) output

...
$ ~/.aspire/bin/aspire --version
9.5.2+2fc27528ec03a94f2d6c663c9fa2392a9568ee41
$ az login --service-principal --username "$AZURE_CLIENT_ID" --password "$AZURE_CLIENT_SECRET" --tenant "$AZURE_TENANT_ID"
...
$ ~/.aspire/bin/aspire deploy
🔬 Checking project type...: AppHost.csproj
🛠  Building apphost... AppHost.csproj
🛠 Generating artifacts...
Step 1: Analyzing model.
✅ COMPLETED: Analyzing model. completed successfully
════════════════════════════════════════════════════════════════════════════════
Step 2: Validating Azure CLI authentication
✅ COMPLETED: Azure CLI authentication validated successfully
════════════════════════════════════════════════════════════════════════════════
Step 3: Deploying Azure resources
Deploying cae: 0%
Deploying postgres: 0%
Deploying postgres-kv: 0%
Deploying messaging: 0%
Deploying storage: 0%
Deploying azureopenai: 0%
Deploying app-identity: 0%
Deploying app-roles-postgres-kv: 0%
Deploying app-roles-messaging: 0%
Deploying app-roles-storage: 0%
Deploying functions-identity: 0%
Deploying functions-roles-storage: 0%
Deploying functions-roles-messaging: 0%
Deploying functions-roles-azureopenai: 0%
Deploying functions-roles-postgres-kv: 0%
Deploying cae: 0%
✗ FAILED: Deploying postgres: 0%
✗ FAILED: Deploying postgres-kv: 0%
Deploying messaging: 0%
Deploying storage: 0%
Deploying azureopenai: 0%
Deploying app-identity: 0%
✗ FAILED: Deploying app-roles-postgres-kv: 0%
Deploying app-roles-messaging: 0%
✗ FAILED: Deploying app-roles-storage: 0%
✗ FAILED: Deploying functions-identity: 0%
Deploying functions-roles-storage: 0%
✗ FAILED: Deploying functions-roles-messaging: 0%
Deploying functions-roles-azureopenai: 0%
Deploying functions-roles-postgres-kv: 0%
❌ FAILED: Failed to deploy Azure resources
════════════════════════════════════════════════════════════════════════════════
✗ DEPLOYMENT FAILED: Deployment completed with errors
$ echo "aspire deploy exitcode:$?"
aspire deploy exitcode:0

As you can see, the deployment of some resources failed so

Step 3: Deploying Azure resources
...
✗ FAILED: Deploying postgres: 0%
...
❌ FAILED: Failed to deploy Azure resources

so the process stops. But the exit code is 0

Exceptions (if any)

No response

.NET Version info

No response

Anything else?

No response</issue_description>

Comments on the Issue (you are @copilot in this section)

  • Fixes dotnet/aspire#12854

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot avatar Nov 12 '25 07:11 Copilot

🚀 Dogfood this PR with:

⚠️ WARNING: Do not do this without first carefully reviewing the code of this PR to satisfy yourself it is safe.

curl -fsSL https://raw.githubusercontent.com/dotnet/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 12923

Or

  • Run remotely in PowerShell:
iex "& { $(irm https://raw.githubusercontent.com/dotnet/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 12923"

github-actions[bot] avatar Nov 13 '25 05:11 github-actions[bot]

@captainsafia we should backport this one. Makes it unusable in CI pipelines ATM. We should see if we have any more issues like this.

davidfowl avatar Nov 13 '25 07:11 davidfowl

@copilot We should be using the apphost's exit code to determine the status code of the operation for any command that calls into the apphost, including publish, do, deploy, run. The backchannel crashing should not contribute to the exit code of the cli.

davidfowl avatar Nov 17 '25 18:11 davidfowl

@copilot We should be using the apphost's exit code to determine the status code of the operation for any command that calls into the apphost, including publish, do, deploy, run. The backchannel crash...

Updated in commit e6792a9. The CLI now uses the apphost's exit code directly when it's non-zero, and only uses the backchannel failure information when the apphost exits with 0 but reported failures. This ensures consistent behavior across run, publish, deploy, and do commands.

Copilot avatar Nov 17 '25 18:11 Copilot

So to try and validate this I added the following pipeline step:

builder.Pipeline.AddStep("crashy", stepBuilder =>
{
   var pendingCrash = Task.Run(async () =>
   {
     // Just allowing some time to attach debuggers etc.
     await Task.Delay(20000);
     var rand = new Random();
     var x = rand.Next(10);
     if (x < 12)
     {
       Environment.Exit(222);
     }
  });
  return pendingCrash;
});

This will crash every time after the delay. It is return error code 6, however its not returning the exit code 222 when I use -d.

This PR is already better than the current behavior, but I thought we wanted to flow the error code through?

mitchdenny avatar Nov 18 '25 09:11 mitchdenny

/backport to release/13.0

davidfowl avatar Nov 18 '25 15:11 davidfowl

Started backporting to release/13.0: https://github.com/dotnet/aspire/actions/runs/19470766112

github-actions[bot] avatar Nov 18 '25 15:11 github-actions[bot]

@davidfowl backporting to "release/13.0" failed, the patch most likely resulted in conflicts:

$ git am --3way --empty=keep --ignore-whitespace --keep-non-patch changes.patch

Patch format detection failed.
Error: The process '/usr/bin/git' failed with exit code 128

Please backport manually!

github-actions[bot] avatar Nov 18 '25 15:11 github-actions[bot]