serving icon indicating copy to clipboard operation
serving copied to clipboard

Ensure websocket conections persist until done on queue-proxy drain

Open elijah-rou opened this issue 10 months ago • 10 comments

Fixes: Websockets closing abruptly when queue-proxy undergoes drain.

Due to hijacked connections in net/http not being respected when server.Shutdown is called, any active websocket connections currently end as soon as the queue-proxy calls .Shutdown. See https://github.com/gorilla/websocket/issues/448 and https://github.com/golang/go/issues/17721 for details. This patch fixes this issue by introducing an atomic counter of active requests, which increments as a request comes in and decrements as a request handler terminates. During drain, this counter must reach zero or adhere to the revision timeout, in order to call .Shutdown.

Release Note

Introduce a pending requests atom which the queue-proxy can use to gracefully terminate all connections (including highjacked connections)

elijah-rou avatar Feb 07 '25 20:02 elijah-rou

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: elijah-rou Once this PR has been reviewed and has the lgtm label, please assign dsimansk for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

knative-prow[bot] avatar Feb 07 '25 20:02 knative-prow[bot]

Hi @elijah-rou. Thanks for your PR.

I'm waiting for a knative member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

knative-prow[bot] avatar Feb 07 '25 20:02 knative-prow[bot]

/ok-to-test

dprotaso avatar Feb 09 '25 22:02 dprotaso

Codecov Report

:x: Patch coverage is 0% with 23 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 74.95%. Comparing base (6265a8e) to head (2b581d6). :warning: Report is 225 commits behind head on main.

Files with missing lines Patch % Lines
pkg/queue/sharedmain/main.go 0.00% 15 Missing :warning:
pkg/queue/sharedmain/handlers.go 0.00% 8 Missing :warning:

:x: Your project status has failed because the head coverage (74.95%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #15759      +/-   ##
==========================================
- Coverage   80.84%   74.95%   -5.90%     
==========================================
  Files         222      222              
  Lines       18070    18095      +25     
==========================================
- Hits        14609    13563    -1046     
- Misses       3089     4181    +1092     
+ Partials      372      351      -21     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Feb 09 '25 22:02 codecov[bot]

@dprotaso is the only thing missing from this PR an E2E test?

elijah-rou avatar Feb 19 '25 19:02 elijah-rou

/retest

dprotaso avatar Mar 29 '25 22:03 dprotaso

/retest

dprotaso avatar Apr 14 '25 01:04 dprotaso

closing, re-opening to pick up latest actions

dprotaso avatar Apr 14 '25 01:04 dprotaso

ah there's a legit compile error in the e2e test

dprotaso avatar Apr 14 '25 01:04 dprotaso

@elijah-rou: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
istio-latest-no-mesh_serving_main 2b581d6ad4ca175fa346dc66ccc64278f6a408ea link true /test istio-latest-no-mesh

Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

knative-prow[bot] avatar Apr 14 '25 02:04 knative-prow[bot]

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

knative-prow-robot avatar Jul 13 '25 15:07 knative-prow-robot

I've finally been afforded some time so I will pull the changes out of our fork and try complete this soon.

elijah-rou avatar Aug 28 '25 19:08 elijah-rou

@dprotaso I am closing this in lieu of https://github.com/knative/serving/pull/16080, which is based on the new Cerebrium fork

elijah-rou avatar Sep 12 '25 15:09 elijah-rou