contour icon indicating copy to clipboard operation
contour copied to clipboard

Use polling during startup to wait for informers to sync

Open tsaarni opened this issue 1 year ago • 5 comments
trafficstars

This change will use polling for HasSynced() call to align with how client-go's WaitForCacheSync() operates. Previously it was only called when we received objects from the informer. By changing to polling, we prevent a race condition where Contour fails to recognize that the final object in the initial list has been received, which would prevent the xDS server from ever starting.

Fixes #6613

Details

The startup of a follower instance depends on informers signaling that they have finished synchronizing resources with the API server. We use the SingleFileTracker from client-go to track the processing of initial objects. This tracker increments when processing starts on a resource and decrements when its finished.

The HasSynced() method should ideally report synchronization status true right after Finished() is called for the last resource in the initial list (tracker reaches zero). However, in some cases it fails to report true immediately, causig the boolean trigger that starts the xDS server to never be set.

In case of a leader instance, an extra update triggered by the leader selection itself has ensured that the xDS server starts.

tsaarni avatar Aug 16 '24 16:08 tsaarni

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 81.04%. Comparing base (95a8ab2) to head (63fcd50). Report is 2 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #6614      +/-   ##
==========================================
+ Coverage   81.01%   81.04%   +0.02%     
==========================================
  Files         133      133              
  Lines       19997    20001       +4     
==========================================
+ Hits        16201    16210       +9     
+ Misses       3503     3498       -5     
  Partials      293      293              
Files with missing lines Coverage Δ
internal/contour/handler.go 83.43% <100.00%> (+3.43%) :arrow_up:

codecov[bot] avatar Aug 19 '24 12:08 codecov[bot]

Fix after test case failure. Coverage target is bit difficult to reach for this one.

tsaarni avatar Aug 19 '24 13:08 tsaarni

The Contour project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 14d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the PR is closed

You can:

  • Ensure your PR is passing all CI checks. PRs that are fully green are more likely to be reviewed. If you are having trouble with CI checks, reach out to the #contour channel in the Kubernetes Slack workspace.
  • Mark this PR as fresh by commenting or pushing a commit
  • Close this PR
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

github-actions[bot] avatar Sep 04 '24 00:09 github-actions[bot]

The Contour project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 14d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the PR is closed

You can:

  • Ensure your PR is passing all CI checks. PRs that are fully green are more likely to be reviewed. If you are having trouble with CI checks, reach out to the #contour channel in the Kubernetes Slack workspace.
  • Mark this PR as fresh by commenting or pushing a commit
  • Close this PR
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

github-actions[bot] avatar Sep 30 '24 00:09 github-actions[bot]

The Contour project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 14d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the PR is closed

You can:

  • Ensure your PR is passing all CI checks. PRs that are fully green are more likely to be reviewed. If you are having trouble with CI checks, reach out to the #contour channel in the Kubernetes Slack workspace.
  • Mark this PR as fresh by commenting or pushing a commit
  • Close this PR
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

github-actions[bot] avatar Oct 15 '24 00:10 github-actions[bot]