azure-functions-host icon indicating copy to clipboard operation
azure-functions-host copied to clipboard

Ensure extension RPC endpoints ready before processing gRPC messages

Open jviau opened this issue 7 months ago • 2 comments

Issue describing the changes in this PR

resolves #10251

Pull request checklist

IMPORTANT: Currently, changes must be backported to the in-proc branch to be included in Core Tools and non-Flex deployments.

  • [ ] Backporting to the in-proc branch is not required
    • Otherwise: Link to backporting PR -- TODO
  • [x] My changes do not require documentation changes
    • [ ] Otherwise: Documentation issue linked to PR
  • [ ] My changes should not be added to the release notes for the next release
    • [x] Otherwise: I've added my notes to release_notes.md
  • [x] My changes do not need to be backported to a previous version
    • [ ] Otherwise: Backport tracked by issue/PR #issue_or_pr
  • [x] My changes do not require diagnostic events changes
    • Otherwise: I have added/updated all related diagnostic events and their documentation (Documentation issue linked to PR)
  • [x] I have added all required tests (Unit tests, E2E tests)

Additional information

It is believed there is a race condition on startup between the first gRPC message coming in and extension RPC endpoints being registered. While EndpointDataSource has a change token system, leading us to believe that we can update endpoints post startup, there is one of two possibilities occurring:

  1. Only a subset of messages fail due to a time gap between host/worker startup and the extension RPC endpoints being registered.
  2. OR AspNetCore locks-in the available endpoints when UseRouting middleware is first encountered. In which case this host instance would not have extension RPC endpoints for its lifetime.

This fix should address both of those possible scenarios, as routing middleware is only initialized on first call. By ensuring we collect extension endpoints before the first routing occurs we avoid the race condition.

RISK: there is a small risk this could introduce a deadlock. If there is some dependency existing, or later introduced, which requires RPC communication between host and worker before all extensions are loaded host-side, then this could cause a circular dependency and deadlock. I do not believe this is the case today, as testing has shown endpoints to be available immediately on startup before the first worker RPC message comes in.

jviau avatar Jun 27 '24 18:06 jviau