kube icon indicating copy to clipboard operation
kube copied to clipboard

Sometimes Controller stuck in error loop

Open dmolokanov opened this issue 3 years ago • 4 comments

Current and expected behavior

Sometimes Controller stucks in the error loop with following WARN messages. It does not recover by itself, only restart helps.

Nov 17 17:46:28.846  WARN operator::controller: reconcile failed: QueueError { source: InitialListFailed { source: HyperError(hyper::Error(Connect, Error { code: -9810, message: "Internal error" })), backtrace: Backtrace(()) }, backtrace: Backtrace(()) }
Nov 17 17:46:29.051  WARN operator::controller: reconcile failed: QueueError { source: InitialListFailed { source: HyperError(hyper::Error(Connect, Error { code: -9810, message: "Internal error" })), backtrace: Backtrace(()) }, backtrace: Backtrace(()) }
Nov 17 17:46:29.260  WARN operator::controller: reconcile failed: QueueError { source: InitialListFailed { source: HyperError(hyper::Error(Connect, Error { code: -9810, message: "Internal error" })), backtrace: Backtrace(()) }, backtrace: Backtrace(()) }
Nov 17 17:46:29.480  WARN operator::controller: reconcile failed: QueueError { source: InitialListFailed { source: HyperError(hyper::Error(Connect, Error { code: -9810, message: "Internal error" })), backtrace: Backtrace(()) }, backtrace: Backtrace(()) }

this is how I use it

        let client = Client::try_default().await?;
        let api = Api::<CRD>::all(client.clone());

        Controller::new(api, ListParams::default())
            .shutdown_on_signal()
            .run(
                reconcile,
                error_policy,
                KubeContext::new(Shared { client }),
            )
            .for_each(|res| async move {
                match res {
                    Ok((crd, _)) => info!("reconciled"),
                    Err(e) => warn!("reconcile failed: {:?}", e), // todo show all inner errors
                }
            })
            .await;

Possible solution

No response

Additional context

No response

Environment

Client Version: v1.22.2 Server Version: v1.20.9

Configuration and features

k8s-openapi = { version = "0.12.0", features = ["v1_20"], default-features = false } kube = { version = "0.58.1", features = ["derive"] } kube-derive = "0.61.0" kube-runtime = "0.58.1"

Affected crates

No response

Would you like to work on fixing this bug?

No response

dmolokanov avatar Nov 18 '21 01:11 dmolokanov

This looks like an underlying HTTP client issue, but it makes sense that watcher would be more likely to hit the issue. Intuitively I suspect that the issue is that a broken connection gets stuck in Hyper's connection pool.

Are you using HTTPS when connecting to the cluster or plain HTTP?

I don't think we'll get much further without trace logs from Hyper. If you're using tracing-subscriber's EnvLogger (with its default configuration) then you'll be able to get them by running the controller again with the env variable RUST_LOG=info,hyper=trace.

nightkr avatar Nov 18 '21 14:11 nightkr

It's also worth keeping in mind that Kube-rs 0.58.1 is pretty outdated. Can you confirm that this still happens on Kube-rs 0.64.0?

nightkr avatar Nov 18 '21 14:11 nightkr

Another question here; how large is the set of CRDs that you are listing?

Side-note; we are working on a problem to add backoff to the internal mechanisms here, because spamming the same error and instantly retrying is definitely not helpful (and might be a problem if there are rate limiting logic in kubernetes at work here).

clux avatar Nov 18 '21 16:11 clux

Ok, I'll update my setup and try to repro the issue with trace logs.

No, CRD is pretty small, spec is 8 required fields and another 8 is optional.

dmolokanov avatar Nov 19 '21 20:11 dmolokanov

Faced the same problem with Vector today. Vector logs:

2023-03-29T17:28:32.024522Z  WARN vector::kubernetes::reflector: Watcher Stream received an error. Retrying. error=InitialListFailed(Service(Closed))
2023-03-29T17:28:32.024537Z  WARN vector::kubernetes::reflector: Watcher Stream received an error. Retrying. error=InitialListFailed(Service(Closed))
2023-03-29T17:28:32.024551Z  WARN vector::kubernetes::reflector: Watcher Stream received an error. Retrying. error=InitialListFailed(Service(Closed))
2023-03-29T17:28:32.024569Z  WARN vector::kubernetes::reflector: Watcher Stream received an error. Retrying. error=InitialListFailed(Service(Closed))

These messages continue until CPU and memory leaks and, finally, an outage.

Version: 0.75.0 https://github.com/vectordotdev/vector/blob/11fb1aaf33125338a680596d74691e6ef71e3017/Cargo.toml#L274

Watcher configuration is straightforward:

        let pods = Api::<Pod>::all(client.clone());

        let pod_watcher = watcher(
            pods,
            ListParams {
                field_selector: Some(field_selector),
                label_selector: Some(label_selector),
                ..Default::default()
            },
        );

And instead of events it constantly yields the error https://github.com/vectordotdev/vector/blob/11fb1aaf33125338a680596d74691e6ef71e3017/src/kubernetes/reflector.rs#L57-L59

nabokihms avatar Mar 29 '23 17:03 nabokihms

I don't think I can solve the problem on the vector side effectively. @clux are you ok with me creating a PR regarding this issue?

P.S. Service(Closed) What does this mean btw?

nabokihms avatar Mar 29 '23 17:03 nabokihms

I don't think I can solve the problem on the vector side effectively. @clux are you ok with me creating a PR regarding this issue?

Sure, but what are you proposing to do?

P.S. Service(Closed) What does this mean btw?

uhm, not sure in this context. maybe the apiserver hangs up on us?

so normally the list errors i've encountered are because of permission issues, but this doesn't seem to be that. is the cluster very large or something? maybe the apiserver does not like us doing list calls without pagination?

clux avatar Mar 29 '23 21:03 clux

Vector tries to invoke three watchers: pods, namespaces, and nodes. The problem is that, in my case, it tries to do it sixty times (because of the architecture).

The cluster is small: 217 pods 8 nodes 33 namespaces

Don't think there is a problem with the size. I figured out that this is a rate limit error applied to all service accounts by default in Kubernetes.

According to the issue, I think I can try to switch to using StreamBackoff in vector for watchers. Hope it will reduce the load. For kube-rs, as was proposed, I'd like to try to add pagination for the initial list request (to reduce the load for clusters with 10k pods or more).

nabokihms avatar Mar 30 '23 00:03 nabokihms

A pagination setup within the watcher would be cool to see! Note that it's one of those things that can be a little hairy because we present a Restarted event to the outside world as a "world update" event, so there are actually two potential approaches here:

  1. buffer internally until pagination is complete
  2. present events as they come from watcher

The second will be "faster" from a user POV, but it will come with the downside that stores that build upon them will be incomplete while relists happen, which is undesirable (and not what we have advertised). The first will also mean more memory usage while the buffering happens (as you'll have up to twice the reflector cache in memory right before swapping), but that's nothing new, and I feel not having complete data is the worse problem.

It is technically possible to present Page(Vec<X>) events that could be handled at the Store level by doing some fancy parallel storage until the pagination is complete for a fancy "best of both worlds" type approach, but that sounds super hairy and breaky. Internal buffering in the watcher is probably a good starting point. If we want to optimise it later then that can probably be done as a separate thing - but it's probably not super important that the users gets the events the same millisecond.

Anyway, experimentation with this would be welcome. Internal buffering might be doable without having to battle with the entire world, so would advise going down that path. Happy to review!

clux avatar Mar 30 '23 21:03 clux

This issue has grown to cover many things so will try to address current solutions that now exists and close the issue.

  • Pagination (and streaming lists for newer clusters) can both be experimented with in later versions of kube (and a default page size is also set now in-line with client-go), so hopefully a repeat-fail scenario from initial lists that are too big should be a thing of the past.
  • backoff for watchers is configurable via WatchStreamExt and we have default_backoff() in controllers (but it can also be configured using the unstable-runtime's stream input parameter if you are constructing watch streams manually and want a custom impl Backoff)
  • controller concurrency limits (via controller's Config) or rate limiting via custom clients (see examples) is also a thing that can be tried via tower layers (there are some searchable issues on this)
  • (late spot, but the original issue also reports different versions of various kube crates (which is not supported))

Hopefully this helps someone doing archeology. If problems persist for people, please open more specific issues about this.

clux avatar Sep 08 '23 16:09 clux