Transient `azd deploy` error caused failed deployment
Describe the bug
We run a nightly deployment with azd to validate our bicep templates. As part of that validation we also deploy the code and plan to add some integration tests (not yet added).
The issue is that the azd deploy step failed with what appears to be a transient error from Azure.
Error: deploying service: deploying service api package: deploying service api: failed running az deployment source config-zip: exit code: 1, stdout: , stderr: WARNING: Getting scm site credentials for zip deployment
WARNING: Starting zip deployment. This operation can take a while to complete ...
WARNING: Deployment endpoint responded with status code 502
ERROR: An error occured during deployment. Status Code: 502, Details: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
...
To Reproduce Not able to reproduce the issue.
Expected behavior
It's unclear how to handle this error in my pipeline, it looks like the error is coming from a call that azd is making to Azure. It would be great if azd had the ability to detect this 502 status and performed up to 3 retry attempts before failing the step.
Environment
Information on your environment:
* Runs from GH action workflow
* Uses image: mcr.microsoft.com/azure-dev-cli-apps:latest
* Runs command azd deploy --no-prompt
Additional context There may be other transient errors that would be appropriate to retry. While it is probably a larger scope of work it would also be nice if failed deployments could be retried as some deployments.
@karolz-ms this surfaced again and the response code was HTTP 202 so I looked at the message closer. It seems the error log was captured in kudu.
{
"Message": "An error has occurred.",
"ExceptionMessage": "No log found for 'latest'.",
"ExceptionType": "System.IO.FileNotFoundException",
"StackTrace": " at Kudu.Core.Deployment.DeploymentManager.GetLogEntries(String id) in C:\\Kudu Files\\Private\\src\\master\\Kudu.Core\\Deployment\\DeploymentManager.cs:line 98\r\n at Kudu.Services.Deployment.DeploymentController.GetLogEntry(String id) in C:\\Kudu Files\\Private\\src\\master\\Kudu.Services\\Deployment\\DeploymentController.cs:line 376"
}
@KSchlobohm 202 is a success code... are you saying azd treated it as an error?
Sorry for the confusion, when I found the HTTP 202 code I thought it was another instance of the same error. The 202 status does not seem to be related to the kudu error above.
This is an instance of a different error with the same azd deploy operation
Hmm. I have searched Kudu issues but haven't found anything that might be relevant.
@suwatch any ideas why Ken could intermittently get a ZipDeploy failure with "no log found for 'latest'" error?
Issues which are tracked here,
- Bubbling up underlying deployment errors. Should have been fixed as part of #786
- Auto-retries in Azd for transient errors before giving up.
@KSchlobohm's update offline for issue 1 above - From the logs, it looks like the azd deploy is failing because the Azure App Service we are deploying to is not ready or may not be healthy and is recovering. The recommended step is to re-run the azd deploy command.
Looking at the az code more, it seems like the error is indicating a problem with the POST request starting the zipdeploy. I don't think az had existing retries, and we should consider such for the new azd changes.
If you use azure core from the azure go sdk, you can take advantage of the pipelines that have all kinds of patterns built in for retries, exponential backoff, etc.
Also, if we can get an openapi spec for the service endpoint we're calling, we could generate a library that has these retries built in.
Since this issue was filed, azd has switched over to not depend on az, and we increased retries with zipdeploy submission with that should recover from temporary hiccups in app service #1051. Closing this.