kube
kube copied to clipboard
Sometimes Controller stuck in error loop
Current and expected behavior
Sometimes Controller stucks in the error loop with following WARN messages. It does not recover by itself, only restart helps.
Nov 17 17:46:28.846 WARN operator::controller: reconcile failed: QueueError { source: InitialListFailed { source: HyperError(hyper::Error(Connect, Error { code: -9810, message: "Internal error" })), backtrace: Backtrace(()) }, backtrace: Backtrace(()) }
Nov 17 17:46:29.051 WARN operator::controller: reconcile failed: QueueError { source: InitialListFailed { source: HyperError(hyper::Error(Connect, Error { code: -9810, message: "Internal error" })), backtrace: Backtrace(()) }, backtrace: Backtrace(()) }
Nov 17 17:46:29.260 WARN operator::controller: reconcile failed: QueueError { source: InitialListFailed { source: HyperError(hyper::Error(Connect, Error { code: -9810, message: "Internal error" })), backtrace: Backtrace(()) }, backtrace: Backtrace(()) }
Nov 17 17:46:29.480 WARN operator::controller: reconcile failed: QueueError { source: InitialListFailed { source: HyperError(hyper::Error(Connect, Error { code: -9810, message: "Internal error" })), backtrace: Backtrace(()) }, backtrace: Backtrace(()) }
this is how I use it
let client = Client::try_default().await?;
let api = Api::<CRD>::all(client.clone());
Controller::new(api, ListParams::default())
.shutdown_on_signal()
.run(
reconcile,
error_policy,
KubeContext::new(Shared { client }),
)
.for_each(|res| async move {
match res {
Ok((crd, _)) => info!("reconciled"),
Err(e) => warn!("reconcile failed: {:?}", e), // todo show all inner errors
}
})
.await;
Possible solution
No response
Additional context
No response
Environment
Client Version: v1.22.2 Server Version: v1.20.9
Configuration and features
k8s-openapi = { version = "0.12.0", features = ["v1_20"], default-features = false } kube = { version = "0.58.1", features = ["derive"] } kube-derive = "0.61.0" kube-runtime = "0.58.1"
Affected crates
No response
Would you like to work on fixing this bug?
No response
This looks like an underlying HTTP client issue, but it makes sense that watcher would be more likely to hit the issue. Intuitively I suspect that the issue is that a broken connection gets stuck in Hyper's connection pool.
Are you using HTTPS when connecting to the cluster or plain HTTP?
I don't think we'll get much further without trace logs from Hyper. If you're using tracing-subscriber's EnvLogger (with its default configuration) then you'll be able to get them by running the controller again with the env variable RUST_LOG=info,hyper=trace.
It's also worth keeping in mind that Kube-rs 0.58.1 is pretty outdated. Can you confirm that this still happens on Kube-rs 0.64.0?
Another question here; how large is the set of CRDs that you are listing?
Side-note; we are working on a problem to add backoff to the internal mechanisms here, because spamming the same error and instantly retrying is definitely not helpful (and might be a problem if there are rate limiting logic in kubernetes at work here).
Ok, I'll update my setup and try to repro the issue with trace logs.
No, CRD is pretty small, spec is 8 required fields and another 8 is optional.
Faced the same problem with Vector today. Vector logs:
2023-03-29T17:28:32.024522Z WARN vector::kubernetes::reflector: Watcher Stream received an error. Retrying. error=InitialListFailed(Service(Closed))
2023-03-29T17:28:32.024537Z WARN vector::kubernetes::reflector: Watcher Stream received an error. Retrying. error=InitialListFailed(Service(Closed))
2023-03-29T17:28:32.024551Z WARN vector::kubernetes::reflector: Watcher Stream received an error. Retrying. error=InitialListFailed(Service(Closed))
2023-03-29T17:28:32.024569Z WARN vector::kubernetes::reflector: Watcher Stream received an error. Retrying. error=InitialListFailed(Service(Closed))
These messages continue until CPU and memory leaks and, finally, an outage.
Version: 0.75.0 https://github.com/vectordotdev/vector/blob/11fb1aaf33125338a680596d74691e6ef71e3017/Cargo.toml#L274
Watcher configuration is straightforward:
let pods = Api::<Pod>::all(client.clone());
let pod_watcher = watcher(
pods,
ListParams {
field_selector: Some(field_selector),
label_selector: Some(label_selector),
..Default::default()
},
);
And instead of events it constantly yields the error https://github.com/vectordotdev/vector/blob/11fb1aaf33125338a680596d74691e6ef71e3017/src/kubernetes/reflector.rs#L57-L59
I don't think I can solve the problem on the vector side effectively. @clux are you ok with me creating a PR regarding this issue?
P.S. Service(Closed) What does this mean btw?
I don't think I can solve the problem on the vector side effectively. @clux are you ok with me creating a PR regarding this issue?
Sure, but what are you proposing to do?
P.S.
Service(Closed)What does this mean btw?
uhm, not sure in this context. maybe the apiserver hangs up on us?
so normally the list errors i've encountered are because of permission issues, but this doesn't seem to be that. is the cluster very large or something? maybe the apiserver does not like us doing list calls without pagination?
Vector tries to invoke three watchers: pods, namespaces, and nodes. The problem is that, in my case, it tries to do it sixty times (because of the architecture).
The cluster is small: 217 pods 8 nodes 33 namespaces
Don't think there is a problem with the size. I figured out that this is a rate limit error applied to all service accounts by default in Kubernetes.
According to the issue, I think I can try to switch to using StreamBackoff in vector for watchers. Hope it will reduce the load. For kube-rs, as was proposed, I'd like to try to add pagination for the initial list request (to reduce the load for clusters with 10k pods or more).
A pagination setup within the watcher would be cool to see! Note that it's one of those things that can be a little hairy because we present a Restarted event to the outside world as a "world update" event, so there are actually two potential approaches here:
- buffer internally until pagination is complete
- present events as they come from watcher
The second will be "faster" from a user POV, but it will come with the downside that stores that build upon them will be incomplete while relists happen, which is undesirable (and not what we have advertised). The first will also mean more memory usage while the buffering happens (as you'll have up to twice the reflector cache in memory right before swapping), but that's nothing new, and I feel not having complete data is the worse problem.
It is technically possible to present Page(Vec<X>) events that could be handled at the Store level by doing some fancy parallel storage until the pagination is complete for a fancy "best of both worlds" type approach, but that sounds super hairy and breaky. Internal buffering in the watcher is probably a good starting point. If we want to optimise it later then that can probably be done as a separate thing - but it's probably not super important that the users gets the events the same millisecond.
Anyway, experimentation with this would be welcome. Internal buffering might be doable without having to battle with the entire world, so would advise going down that path. Happy to review!
This issue has grown to cover many things so will try to address current solutions that now exists and close the issue.
- Pagination (and streaming lists for newer clusters) can both be experimented with in later versions of kube (and a default page size is also set now in-line with client-go), so hopefully a repeat-fail scenario from initial lists that are too big should be a thing of the past.
- backoff for watchers is configurable via
WatchStreamExtand we havedefault_backoff()in controllers (but it can also be configured using theunstable-runtime's stream input parameter if you are constructing watch streams manually and want a customimpl Backoff) - controller concurrency limits (via controller's
Config) or rate limiting via custom clients (see examples) is also a thing that can be tried via tower layers (there are some searchable issues on this) - (late spot, but the original issue also reports different versions of various kube crates (which is not supported))
Hopefully this helps someone doing archeology. If problems persist for people, please open more specific issues about this.