baremetal-runtimecfg icon indicating copy to clipboard operation
baremetal-runtimecfg copied to clipboard

OCPBUGS-42805: Add node caching with Kubernetes watch API to reduce API load

Open mkowalski opened this issue 1 month ago • 6 comments

Replace frequent node LIST calls with a watch-based cache in monitor loops. Implements NodeWatcher similar to existing loggingconfig watcher pattern.

  • Add pkg/nodeconfig with NodeWatcher and NodeCacheGetter interface
  • Refactor node retrieval functions to use cache when available
  • Update monitors (dynkeepalived, coredns) to use NodeWatcher
  • Reduce API calls from hundreds/min to single watch connection
  • Maintain backward compatibility with nil cache fallback

Fixes: OCPBUGS-42805

mkowalski avatar Nov 25 '25 12:11 mkowalski

@mkowalski: This pull request references Jira Issue OCPBUGS-42805, which is invalid:

  • expected the bug to target the "4.21.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Replace frequent node LIST calls with a watch-based cache in monitor loops. Implements NodeWatcher similar to existing loggingconfig watcher pattern.

  • Add pkg/nodeconfig with NodeWatcher and NodeCacheGetter interface
  • Refactor node retrieval functions to use cache when available
  • Update monitors (dynkeepalived, coredns) to use NodeWatcher
  • Reduce API calls from hundreds/min to single watch connection
  • Maintain backward compatibility with nil cache fallback

Fixes: OCPBUGS-42805

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Nov 25 '25 12:11 openshift-ci-robot

/test e2e-metal-ipi-ovn-dualstack /test e2e-metal-ipi-ovn-ipv4

mkowalski avatar Nov 25 '25 17:11 mkowalski

/hold

I don't want it for 4.21; will be on hold till branch cut and should land only in 4.22

mkowalski avatar Nov 25 '25 17:11 mkowalski

/test e2e-metal-ipi-ovn-dualstack /test e2e-metal-ipi-ovn-ipv4

Conformance failures

{  fail [github.com/openshift/origin/test/extended/apiserver/tls.go:151]: Expected success true, got false with TLS version VersionTLS12 dialing master}

mkowalski avatar Nov 26 '25 10:11 mkowalski

@mkowalski: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci[bot] avatar Nov 26 '25 13:11 openshift-ci[bot]

/lgtm

Looks like a good way to address a longstanding issue!

cybertron avatar Nov 26 '25 22:11 cybertron

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cybertron, mkowalski

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • ~~OWNERS~~ [cybertron,mkowalski]

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar Nov 26 '25 22:11 openshift-ci[bot]