calico icon indicating copy to clipboard operation
calico copied to clipboard

Adding null check for VTEPInfo and IPaddress

Open ajgupta42 opened this issue 3 months ago • 8 comments

Description

Missing both IPv4 and IPv6 VTEP information panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation] github.com/projectcalico/calico/felix/calc.l3rrNodeInfo.AddressAsCIDR

/go/src/github.com/projectcalico/calico/felix/calc/l3_route_resolver.go:166

Todos

  • [ ] Tests
  • [ ] Documentation
  • [ ] Release note

Release Note

Adding nil checks

Reminder for the reviewer

Make sure that this PR has the correct labels and milestone set.

Every PR needs one docs-* label.

  • docs-pr-required: This change requires a change to the documentation that has not been completed yet.
  • docs-completed: This change has all necessary documentation completed.
  • docs-not-required: This change has no user-facing impact and requires no docs.

Every PR needs one release-note-* label.

  • release-note-required: This PR has user-facing changes. Most PRs should have this label.
  • release-note-not-required: This PR has no user-facing changes.

Other optional labels:

  • cherry-pick-candidate: This PR should be cherry-picked to an earlier release. For bug fixes only.
  • needs-operator-pr: This PR is related to install and requires a corresponding change to the operator.

ajgupta42 avatar Sep 17 '25 10:09 ajgupta42

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Sep 17 '25 10:09 CLAassistant

@ajgupta42 Please could you add more information:

  • the stack trace of the panic that you saw
  • any ideas about why this is happening in your cluster - bearing in mind that we've missed this for some reason in our existing testing.

nelljerram avatar Sep 17 '25 10:09 nelljerram

@nelljerram It was failing on AsCIDR, so added some nil checks to prevent seg fault https://github.com/projectcalico/calico/pull/11028/files#diff-dfd7a82ed33eb6795753c3c8c92bddf7470585efe073b284384c4a11a557e8f8R171

IP address was blank for a node which caused this issue.

ajgupta42 avatar Sep 17 '25 10:09 ajgupta42

Thanks @ajgupta42 but I don't fully understand yet.

  • Can you provide the full stack trace? That will help me understand the scenario overall.
  • What exactly do you mean by "IP address was blank for a node"?

nelljerram avatar Sep 18 '25 08:09 nelljerram

Thanks @nelljerram for the quick response, highly appreciated. Please find the full stack trace

goroutine 582 [runningl:
github.com/projectcalico/calico/felix/calc.13rrNodelnfo.AddressesAsCIDRs(f{0xa, Oxa, Ox20, 0x18}, {(0xa, Oxa, 0x20, 0x0}, Ox1a}, (ОxО,...])...)
/go/src/github.com/projectcalico/calico/felix/calc/13_route_resolver.go:166 +0x309
github.com/projectcalico/calico/felix/calc.(*L3RouteResolver).onNodeUpdate(0xc00103e990, {0xc000bd352d, Oxf), Oxc001319978)
/go/src/github.com/projectcalico/calico/felix/calc/13_route_resolver.go:578 +0xe25
github.com/projectcalico/calico/felix/calc.(*L3RouteResolver).OnResourceUpdate(0xc00103e990, ii(0x509faa0, 0xc001410630}, (0x488d780, 0xc000a036c0}, {0xc001164900,
0x9}, Ox0, OxO}, Ox1})
/go/src/github.com/projectcalico/calico/felix/calc/13_route_resolver.go:442 +0xae5
github.com/projectcalico/calico/felix/dispatcher.updateHandlers.DispatchToAlI(..)
/go/src/github.com/projectcalico/calico/felix/dispatcher/dispatcher.go:45
github.com/projectcalico/calico/felix/dispatcher.(*Dispatcher).OnUpdate(0xc000072f20, (l{0x509faaO, 0xc001410630}, {0x488d780,
Oxc000a036c0}, (0xc001164900, 0x9}, 0x0,
OxO}, 0x1))
/go/src/github.com/projectcalico/calico/felix/dispatcher/dispatcher.go:105 +0x395
github.com/projectcalico/calico/felix/dispatcher.(*Dispatcher).OnUpdates(...)
/go/src/github.com/projectcalico/calico/felix/dispatcher/dispatcher.go:80
github.com/projectcalico/calico/felix/calc. (*CalcGraph).OnUpdates(...)
/go/src/github.com/projectcalico/calico/felix/calc/calc_graph.go:137
github.com/projectcalico/calico/felix/calc.(*AsyncCalcGraph).loop(0xc0006a5500)
/go/src/github.com/projectcalico/calico/felix/calc/async_calc_graph.go:151 +0x75c
created by github.com/projectcalico/calico/felix/calc.(*AsyncCalcGraph).Start in goroutine 1 /go/src/github.com/projectcalico/calico/felix/calc/async_calc_graph.go:259 +0xf8

What exactly do you mean by "IP address was blank for a node"? > This was an edge-case scenario. Due to a bug, the node’s ExternalIP became blank (it wasn’t set to <none>). We had to manually clean it up & set to <none>. I believe we can handle this with a nil check. Please let me know if I’m missing anything or if there are other edge cases we should cover.

ajgupta42 avatar Sep 18 '25 12:09 ajgupta42

@ajgupta42 Thank you for your last response. I'm afraid I'm still a little unclear about the scenario, though, and I also realized that we should have a test for it - which would be an excellent way of clarifying the scenario. It sounds like an FV test would be appropriate for this situation, so could you look at adding an FV test? Please let me know if you need more info about how our FV tests work.

nelljerram avatar Sep 22 '25 11:09 nelljerram

@ajgupta42 You've added UTs - thank you - but unfortunately they don't help to explain the overall scenario, and that's why I suggested an FV test.

nelljerram avatar Sep 26 '25 16:09 nelljerram

This PR is stale because it has been open for 60 days with no activity.

github-actions[bot] avatar Nov 25 '25 18:11 github-actions[bot]