azure-functions-host
azure-functions-host copied to clipboard
"Azure functions runtime is unreachable" error exactly one year after app deployment.
The sequence of events
-
Exactly a year ago (on 2/23/2022), I'd deployed my APIs to my Azure function app.
-
Yesterday (i.e. a year later, on 2/23/2023), I noticed that the APIs started returning
503 / unavailable
errors. Upon logging into the Azure portal I noticed theAzure functions runtime is unreachable
error. Restarting the app didn't help. -
The
Diagnose and solve problems
tab on Azure portal led to the following discovery: 11 instances ofMicrosoft.AspNetCore.Connections.ConnectionAbortedException
. Not entirely sure how much this is related to the app's downtime. -
I opened a support ticket
2302230030002367
for this issue. While the root-cause investigation was inconclusive, I was able to resolve the issue by simply redeploying the app once again.
Probable root cause
-
I later stumbled upon the possible root cause (thanks to Erik_ERBBQ). The value of
WEBSITE_RUN_FROM_PACKAGE
app setting is a SAS token that expires exactly one year after deployment! -
The SAS token is generated by the Azure pipeline deployment task
AzureFunctionApp@1
which I use to deploy my app. -
And here is the line of code to blame (LINK). So I guess, this bug really belongs in that github repo(?)
Investigative information
- SubscriptionID:
014a6441-97ae-45e2-8b37-ff577abb1086
- Function App version:
4x
- Function App name:
cloudskewfunctionsprod
- Function name(s) (as appropriate): multiple
- Invocation ID: N/A
- Region:
west europe
- Timestamp: Based on the metric chart below, the outage started at 11:17 AM on 2/23/2023 UTC time. This is corroborated by the data from the
Diagnose and solve problems
portal tab. The outage finally ended when I redeployed the app around 6:15 PM.
Related information
-
Programming language used: C# with .NET 6
-
Links to source: Unfortunately this is a private github repo.
-
Bindings used: The APIs are all HTTP-triggered. No input or output bindings are used (I mostly use the SDK for all IO). A timer-triggered function also exists in the same app.
-
App settings used: See screenshot below
Other investigation notes
- I did look through the MSDN documentation for the
Azure functions runtime is unreachable
error (LINK). But nothing conclusive stood out. - Also rotated the storage account key/connectionString used in the
AzureWebJobsStorage
app configuration setting. It didn't help. - The following github issues might be related (but I'm not 100% sure).
- https://github.com/Azure/azure-functions-host/issues?q=is%3Aissue+unreachable
- https://github.com/Azure/azure-functions/issues?q=is%3Aissue+unreachable
A quick few thoughts/notes/observations: I think that an entire class of errors can be pre-empted if the diagnose and solve problems
wizard and the configurations
blade started flagging invalid settings (including expired SAS urls). Similar to how invalid key-vault references are flagged.
Also, I found this documentation page very useful. But, IIRC, I got to that page by doing a google/bing search for the Azure functions runtime is unreachable
error. Wish the portal could somehow have linked me to that page.
We have also experienced this same issue.
Same. Also experienced this in the past. Opened a ticket too. Relevant prior issue = https://github.com/microsoft/azure-pipelines-tasks/issues/14837
We run a fleet of Azure functions and this is known as the dreaded "birthday bug" Likes to pop up on holiday weekends. 😉
Surprising this isn't addressed anywhere inside of the Azure tooling.
We've resigned to creating a function app that scans our Azure instance for expiring SAS tokens.
Thank you @mithunshanbhag for the super detailed issue!
I've created an issue to track an improvement to ensure a warning is emitted in #9358
For your scenario, I'd also recommend configuring the app to use a managed identity, as described here , as that would avoid the issue with the expiration (@TroyWitthoeft , this should help address the scanning need you've mentioned above).
Also following up with tooling and deployment teams to identify additional enhancements we can make here.
Thanks!
Thanks man, this solved my day.
Works like a charm!
finally found this threat, solved my issues, thanks! @mithunshanbhag
Thank you @mithunshanbhag for the super detailed issue!
I've created an issue to track an improvement to ensure a warning is emitted in #9358
For your scenario, I'd also recommend configuring the app to use a managed identity, as described here , as that would avoid the issue with the expiration (@TroyWitthoeft , this should help address the scanning need you've mentioned above).
Also following up with tooling and deployment teams to identify additional enhancements we can make here.
Thanks!
Awesome! It's a the little things ... I know a warning will help reduce the troubleshooting time. Thank you.
@fabiocav - With regards to your suggestion for using a managed identity, the link you posted as an example comes right back here? Mislink? I get the impression that using a function app's msi instead of SAS token is a possible mitigation?
Hey! I just got a warning on one of our function apps! 🎉 Nice!
It's a bit early, but the functionality is there!
Just stumbled upon this issue today with no errors whatsoever. It was happening on an Azure Function app which celebrated its first anniversary since the last deployment on the 25th of January 2023 (no one noticed it until now because it's only used at the end of the month). Symptoms: Both QA an PROD environments were returning a 503 on all http endpoints.
No "Azure Functions runtime unreachable" message. No SAS token expired message or set to expire like @TroyWitthoeft has (even after redeployment). The Diagnostic tab was unhelpful also. It just told me "Hey we noticed your function was down for the last 15 minutes", yeah no shit.
Resolution: Just redeploy the function app and it will start working again.
Any idea why the WEBSITE_RUN_FROM_PACKAGE is a sas token url for azure functions but for app services it's just "1"? There HAS to be a better way of handling this. Who would expect their app to stop working after a year with no warning?