ovn-kubernetes icon indicating copy to clipboard operation
ovn-kubernetes copied to clipboard

Add IPsec tunnel state metric for IKE Child SA establishment

Open pperiyasamy opened this issue 7 months ago • 1 comments
trafficstars

This PR registers metric ovnkube_controller_ipsec_tunnel_state containing local_ip and remote_ip as the labels for reporting ipsec status for every Geneve tunnel. When IKE Child SA is established for the tunnel, then metric for the tunnel is set with 1, otherwise it will be set with 0. The IPsec tunnel status is being checked periodically for every 5s and metric will be updated accordingly.

pperiyasamy avatar Apr 24 '25 12:04 pperiyasamy

Pull Request Test Coverage Report for Build 17601206192

Details

  • 153 of 226 (67.7%) changed or added relevant lines in 5 files are covered.
  • 45 unchanged lines in 4 files lost coverage.
  • Overall coverage decreased (-0.005%) to 54.887%

Changes Missing Coverage Covered Lines Changed/Added Lines %
go-controller/pkg/metrics/node.go 0 5 0.0%
go-controller/pkg/util/ovs.go 2 8 25.0%
go-controller/pkg/metrics/ipsec.go 18 25 72.0%
go-controller/pkg/metrics/ovs.go 41 54 75.93%
go-controller/pkg/metrics/ovnkube_controller.go 92 134 68.66%
<!-- Total: 153 226
Files with Coverage Reduction New Missed Lines %
go-controller/pkg/ovn/namespace.go 2 56.31%
go-controller/pkg/ovn/egressip.go 4 67.84%
go-controller/pkg/ovn/controller/network_qos/network_qos.go 15 63.56%
go-controller/pkg/ovn/controller/network_qos/network_qos_namespace.go 24 49.62%
<!-- Total: 45
Totals Coverage Status
Change from base Build 17599322795: -0.005%
Covered Lines: 39026
Relevant Lines: 71103

💛 - Coveralls

coveralls avatar May 06 '25 10:05 coveralls

Walkthrough

Adds IPsec support and monitoring: new ipsec runner, Geneve tunnel discovery, netlink XFRM subscription, IPsec tunnel-state metric and registration, node integration and config flag, unit and e2e tests, metrics mocks concurrency improvements, Helm and CI adjustments, and dependency updates.

Changes

Cohort / File(s) Summary of change
IPsec parsing & monitor
go-controller/pkg/metrics/ipsec.go, go-controller/pkg/metrics/ovnkube_controller.go
New ipsec client type and parser for established IPsec Child SAs; MonitorIPsecTunnelsState added to subscribe to XFRM events and update ipsec_tunnel_state Prometheus metric.
OVS Geneve discovery
go-controller/pkg/metrics/ovs.go
Added TunnelInfo type and JSON-based listGeneveTunnels to enumerate Geneve interfaces (name, admin/oper state, options).
Node metrics integration
go-controller/pkg/metrics/node.go, go-controller/cmd/ovnkube/ovnkube.go
RegisterNodeMetrics signature changed to accept a WaitGroup and return error; wires MonitorIPsecTunnelsState when enable-ipsec is configured and propagates errors.
Metrics mocks & GaugeVec
go-controller/pkg/metrics/mocks/gauge.go
Made GaugeMock concurrency-safe (mutex, pointer receivers); added GaugeVecMock with per-label gauges and interface-stubbed methods.
IPsec & OVS tests
go-controller/pkg/metrics/ipsec_test.go, go-controller/pkg/metrics/ovs_test.go
Added thread-safe fake IPsec and OVS clients, Ginkgo/Gomega tests exercising ipsec_tunnel_state behavior and fake clients with synchronized output mutation.
Exec helpers (ipsec runner)
go-controller/pkg/util/ovs.go, go-controller/pkg/util/ovs_unit_test.go
Added ipsec binary discovery field and RunIPsec wrapper; tests updated to include LookPath calls and ipsec presence/absence scenarios.
Config & constants
go-controller/pkg/config/config.go, go-controller/pkg/types/const.go
Added EnableIPsec feature flag and CLI flag; added metric name constants MetricIPsecEnabled and MetricIPsecTunnelIKEChildSAState.
Helm changes (ipsec & env)
helm/ovn-kubernetes/charts/ovn-ipsec/templates/daemonset.yaml, helm/ovn-kubernetes/charts/ovnkube-db-raft/templates/statefulset.yaml, helm/ovn-kubernetes/charts/ovnkube-db/templates/deployment.yaml
Removed v3_req extension in CSR generation; added stale pluto.pid cleanup; renamed ENABLE_IPSEC → OVN_ENABLE_IPSEC in db components.
CI, scripts & kind
.github/workflows/test.yml, go-controller/hack/test-go.sh, contrib/kind.sh
Added IPsec matrix entry and ENABLE_IPSEC env, prom2json install steps, included metrics package in root_pkgs tests, switched CSR listing to kubectl.
E2E tests
test/e2e/ipsec.go, test/e2e/util.go
New e2e IPsec test validating ipsec-enabled and ipsec_tunnel_ike_child_sa_state metrics and helper to read ENABLE_IPSEC env.
Dependency update
go-controller/go.mod
Promoted github.com/vishvananda/netns to direct dependency.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Node as ovnkube node
  participant Register as RegisterNodeMetrics
  participant OVS as ovs-vsctl runner
  participant IPSec as ipsec runner
  participant Netlink as XFRM multicast socket
  participant Prom as Prometheus Gauge

  Node->>Register: RegisterNodeMetrics(stop, wg)
  alt enable-ipsec == true
    Register->>Prom: register ipsec_tunnel_state gauge
    Register->>Netlink: subscribe to XFRM groups
    par initial evaluation
      Register->>OVS: listGeneveTunnels()
      Register->>IPSec: ipsec showstates
      Register->>Prom: set gauge (0/1)
    end
    loop on XFRM events
      Netlink-->>Register: state/policy event
      Register->>OVS: listGeneveTunnels()
      Register->>IPSec: ipsec showstates
      Register->>Prom: update gauge
    end
    Node-->>Register: stop
    Register->>Netlink: close socket
  else
    Register-->>Node: return (no IPsec monitor)
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60–90 minutes

Suggested labels

component/node, component/ovnkube-controller

Suggested reviewers

  • tssurya
  • martinkennelly
  • trozet

Pre-merge checks (2 passed, 1 warning)

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 34.48% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title succinctly and accurately summarizes the primary change introduced by this pull request—namely, adding a metric to track IPsec tunnel state for IKE Child SA establishment—and directly aligns with the PR’s main objective without extraneous detail.
Description Check ✅ Passed The description clearly explains that the PR registers the ovnkube_controller_ipsec_tunnel_ike_child_sa_state metric, describes its behavior when IKE Child SAs are established or not, and notes the implementation through xfrm netlink change events, which accurately reflects the changeset.

Poem

A rabbit taps a metric drum—thump thump!
Geneve burrows lit, IPsec tunnels jump.
Pluto naps then wakes, pid files cleared away,
Gauges count the handshakes, zeroes flip to play.
Hop, hop—metrics bloom, a carrot-shaped hooray! 🥕

[!TIP]

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

✨ Finishing Touches
  • [ ] 📝 Generate Docstrings
🧪 Generate unit tests
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Jul 01 '25 18:07 coderabbitai[bot]

/retest-failed

pperiyasamy avatar Jul 04 '25 07:07 pperiyasamy

The following workflows runs were succesfully triggered: https://github.com/ovn-kubernetes/ovn-kubernetes/actions/runs/16023780242

github-actions[bot] avatar Jul 04 '25 07:07 github-actions[bot]

The xfrm event subscription is failing in non-ic deployments due to permission issue and causing kind cluster bring up to fail. investigating...

pperiyasamy avatar Sep 04 '25 14:09 pperiyasamy

The xfrm event subscription is failing in non-ic deployments due to permission issue and causing kind cluster bring up to fail. investigating...

There are permission issues in subscribing xfrm events in ovnkube-node pod in non-ic and multizone IC deployments. it's also needed to collect this metric as a node level metric, fixed that as well.

pperiyasamy avatar Sep 08 '25 12:09 pperiyasamy

I dont have time to re-review this this week peri - under pressure.. btw some feedback - i subscribed to this because i thought it was simple metrics PR but it includes other things. Maybe split it into other PRs in the future. I am guilty of doing this too sometimes but its hard on the reviewer when so many things included and title misleading.

martinkennelly avatar Sep 09 '25 11:09 martinkennelly

Needs simplifaction peri - its too complex as is imo and its hard to read the code imo.

martinkennelly avatar Sep 15 '25 09:09 martinkennelly

I am removing myself from this Peri - i dont have time anymore :(

martinkennelly avatar Sep 15 '25 09:09 martinkennelly