retina icon indicating copy to clipboard operation
retina copied to clipboard

Retina agent pods crash looping after node upgrade (AKS)

Open matthew-duval opened this issue 9 months ago • 4 comments

Describe the bug After a node upgrade to AKSUbuntu-2004gen2fipscontainerd-202503.02.0 my retina agents are crash looping.

To Reproduce Steps to reproduce the behavior:

  1. Create a private AKS cluster (v1.30.4) with Azure monitor enabled
  2. Ensure your system node pool uses the AKSUbuntu-2004gen2fipscontainerd-202503.02.0 node image version
  3. Observe the retina agent pods crash looping.

Expected behavior The retina service to run as expected.

Screenshots

Image

Platform (please complete the following information):

  • OS: AKSUbuntu-2004gen2fipscontainerd-202503.02.0
  • Kubernetes Version: [e.g. 1.30.4]
  • Host: AKS
  • Retina Version: v0.0.25

Additional context Add any other context about the problem here.

matthew-duval avatar Mar 17 '25 12:03 matthew-duval

this might be a kernel issue, the kprobe is trying attach to a non-existence kernel func i think?

nddq avatar Mar 20 '25 21:03 nddq

May I know when will fix this issue, we met the same issue for AKSUbuntu-2004gen2fipscontainerd-202503.13.0"

ZhuowenNie avatar Apr 03 '25 19:04 ZhuowenNie

@matthew-duval Is this arm64 or amd64? Can you provide the kernel version too?

rectified95 avatar Apr 29 '25 20:04 rectified95

Amd64. Here is screenshot of the node system info:

Image

matthew-duval avatar Apr 30 '25 13:04 matthew-duval