azure-functions-host icon indicating copy to clipboard operation
azure-functions-host copied to clipboard

[Feature] Improved Status Endpoint

Open ColbyTresness opened this issue 6 years ago • 10 comments

What problem would the feature you're requesting solve? Please describe.

A single place to go to understand the health of my function app (and all functions within it). Potential scenarios include: alerting customers when something about their system has become unhealthy, building automation to rollback deployment if an app becomes unhealthy, displaying in the Azure portal what the overall status of a function app is

Describe the solution you'd like

Probably an ARM API? But that's up to dev :)

Describe alternatives you've considered

The host status endpoint isn't sufficient here - it gives basic information about the current status of the host, but we need more. Also, if the host is unavailable, sometimes this API is too, which gives us very little information.

Additional context

The portal will benefit from this massively, but there are other reasons to do it as well!

ColbyTresness avatar Sep 16 '19 22:09 ColbyTresness

@kulkarnisonia16 as an FYI as I know previously this endpoint was a big cause for supportability.

@ColbyTresness would be good to understand if anything specific here you are thinking would be surfaced that isn't today? You say it "needs more" but would be good to know what more we have or potentially may have we could surface

jeffhollan avatar Sep 16 '19 23:09 jeffhollan

@btardif Has some opinions on this one. One example is when a user brings a custom container without the functions host in it - the only thing we surface is "host not found".

I'd say the biggest class of issues I want us to get better at is where the host itself can't be reached.

ColbyTresness avatar Sep 17 '19 01:09 ColbyTresness

Most of the capabilities you're asking for are already exposed with the different status APIs (app level and function level). Let's discuss this to better understand what (if any) work needs to be done.

fabiocav avatar Oct 02 '19 20:10 fabiocav

I think the main piece I want is better knowledge of situations where the host can't be reached. I'm imagining displaying some sort of "last known good" state. I don't believe any existing APIs handle this. Also, having function information in the function app API would be helpful - which functions are disabled, in error, or healthy, based on their latest invocation. Also, aggregation across instances - if one instance of the host is healthy but others aren't, it would be great to be able to display that information to users. Those are the main things I'm looking to light up in the portal, I guess.

ColbyTresness avatar Oct 02 '19 21:10 ColbyTresness

If/when we do this, it would be great to include data around the number of successful and failed functions executions

ColbyTresness avatar Dec 12 '19 23:12 ColbyTresness

Just got in a state where I was using a connection string to Service Bus that wasn't fully valid. No functions would trigger, so I didn't see any logs. At first I didn't see any logs anywhere in portal showing me this error. After I restarted the app I saw some "runtime unable to start" (which I believe was related), but the error wasn't super useful. I wonder if any of the stuff @fabiocav mentioned would enable a scenario the UX could have pointed more precisely to "Unable to connect to Service Bus 'foo'. Connection string is invalid" or something

jeffhollan avatar Jan 28 '20 23:01 jeffhollan

Related to #4705

jeffhollan avatar Feb 05 '20 21:02 jeffhollan

Is this related to: #6255 ?? @apawast ??

btardif avatar Aug 06 '20 19:08 btardif

Appears that way

jeffhollan avatar Aug 06 '20 20:08 jeffhollan

Improved observability for functions will be covered by our OTel work #9273. We will emit FaaS compliant telemetry and you will have your choice of dashboards (whatever telemetry sink you decide to use) and not locked into just the one we provide.

jviau avatar Mar 21 '24 16:03 jviau