opentelemetry-operator
opentelemetry-operator copied to clipboard
[target-allocator] targets assigned to old pod after HPA scaled down
Observed an issue while load testing with the HPA created from the collector CRD.
Context:
in my collector spec:
spec:
mode: {{ .Values.collector.mode }}
image: {{ .Values.collector.image }}
minReplicas: 1
maxReplicas: 20
targetAllocator:
enabled: true
image: ghcr.io/open-telemetry/opentelemetry-operator/target-allocator:latest
serviceAccount: {{ .Release.Name }}-collector-targetallocator
prometheusCR:
enabled: false
While load testing, the HPA scales the collector statefulset up to 12 pods. After lowering metric workload for testing, HPA scales back down to a single collector pod in the statefulset.
What I expected:
I expect the target allocator to assign targets to only remaining pod after scale down
What actually happened:
collector-0 does not have targets assigned in the TA, while collector-11 has the target I expected. Collector-11 was terminated and therefore should not have any targets.
$ kubectl get po -n opentelemetry
NAME READY STATUS RESTARTS AGE
curl-moh 1/1 Running 0 115m
lightstep-collector-collector-0 1/1 Running 0 57m
lightstep-collector-targetallocator-b6865b5bb-dc4w5 1/1 Running 0 113m
opentelemetry-operator-controller-manager-575cdcbc57-4d24t 2/2 Running 0 11h
[root@curl-moh:/]$ curl http://lightstep-collector-targetallocator:80/jobs/serviceMonitor%2Favalanche%2Favalanche%2F0/targets?collector_id=lightstep-collector-collector-0
[]
[ root@curl-moh:/ ]$ curl http://lightstep-collector-targetallocator:80/jobs/serviceMonitor%2Favalanche%2Favalanche%2F0/targets?collector_id=lightstep-collector-collector-11
[
{
"targets": [
"10.0.7.184:9001"
],
"labels": {...}
}
]
I wonder if this has to do with HPA stabilization window. It seems that target allocator is reallocating targets when the targets change. But stabilization window implies it will take several minutes for the unneeded pods to be terminated. Therefore the allocator will see the reduction in targets and reassign targets to the collectors (even though HPA is about to scale down). If there is no change in targets after collector scale down is complete, this results in a terminated collector pod being assigned targets.