calico
calico copied to clipboard
Calico potentially losing track of state intermittently?
Expected Behavior
We have Argo CD running in numerous Kubernetes clusters. This includes:
-
argocd-redis-ha-server
StatefulSet pod withredis
container listening on 6379 -
argocd-redis-ha-server
StatefulSet pod withsentinel
container listening on 26379 -
argocd-redis-ha-haproxy
ReplicaSet pods withredis
container listening on ports 6379 and 9101, fronted by a Kubernetes service
We have Calico NetworkPolicies in place to allow the ingress to these ports, for example:
ingress:
- action: Allow
destination:
ports:
- 26379
- 6379
protocol: TCP
source:
namespaceSelector: name == 'argocd'
selector: >-
app.kubernetes.io/name in {'argocd-redis-ha',
'argocd-redis-ha-haproxy', 'argocd-server', 'argocd-repo-server',
'argocd-application-controller'}
order: 150
selector: app.kubernetes.io/name in {'argocd-redis-ha', 'argocd-redis-ha-haproxy'}
types:
- Ingress
And so we expect Argo to work, with nothing being denied. (We have a log & deny all rule at the end too.)
Current Behavior
From time to time (like once a month for a cluster), randomly, on rare occasions not coinciding with new calico-node
or Argo pods, we will see a burst of 3 of blocked Argo flows spaced roughly 100 seconds apart e.g. 1 at 4:57:39 pm, 1 at 4:59:19 pm, 1 at 5:01:00 pm.
These blocked flows report the inverse of the flow we'd normally expect.
e.g. Blocked: argocd-redis-ha-server:26379 --> argocd-redis-ha-haproxy:40962
Expected flow: argocd-redis-ha-haproxy:40962 --> argocd-redis-ha-server:26379
e.g. Blocked: argocd-redis-ha-server:6379 --> argocd-redis-ha-proxy:51418
Expected flow: argocd-redis-ha-proxy:51418 --> argocd-redis-ha-server:6379
I don't see anything in the Calico pod logs out of the ordinary. My understanding of networking is weak, but it feels like Calico which should be stateful, is potentially losing track of the state of the network flows? Is that possible? Or are there any other theories?
Possible Solution
Steps to Reproduce (for bugs)
Context
Your Environment
- Calico version v3.27.0
- Orchestrator version (e.g. kubernetes, mesos, rkt): EKS with Kubelet v1.28.8-eks-ae9a62a
- Operating System and version: Amazon Linux 2, 5.10.217-205.860.amzn2.x86_64
- Link to your project (optional):