Bug: Target Allocator: Promotes a otel-collector pod in "pending state" to have targets allocated
How to reproduce:
- Set open-telemetry collector replicas = 2 (or more)
- Use PodAntiAffinity or request for unavailable resource (cpu / memory) there by the pods are not all schedule-able
- Though the pods are yet in unschedule-able state, target allocator allocates endpoints to it
Shouldn't we wait for the pod to be scheduled before we allocate target endpoints to it?
This is clear if the Pod in question is new, but much less clear if it's an existing Pod being rescheduled. Reassigning targets is a fairly expensive operation for the collectors themselves, as it flushes scrape caches, so we should avoid doing so carelessly. Maybe we should have a configurable timeout for existing Pods, so it's possible to control how much the allocator waits for a Pod before reassigning its targets?
#2528 didn't fix this, it just made the fix easier to implement. The main problem here is that it isn't clear to me what the behaviour should be like. I think the following Pods getting assigned targets works:
- Pods which are Ready
- Pods which were ready less than X seconds ago, and are now not ready, but also not Terminating
But I haven't completely thought this through. If anyone can think of any nasty edge cases for this problem, please speak up and let me know.
sorry that was the auto-closer 😓
This is now fixed with https://github.com/open-telemetry/opentelemetry-operator/issues/3781 and https://github.com/open-telemetry/opentelemetry-operator/issues/3989.