cilium icon indicating copy to clipboard operation
cilium copied to clipboard

daemon: Make cilium status independent from k8s status

Open tkna opened this issue 1 year ago • 17 comments

This PR adds includeK8sCheck flag (default: false) to exclude the k8s probe from the result of health probe. This flag can be set by a newly added http header include-k8s-check of /healthz cilium API and pod's /healthz endpoint.

Fixes: https://github.com/cilium/cilium/issues/31702

tkna avatar May 27 '24 07:05 tkna

@tkna I'm going to mark this for more discussion. While I see your issue, I'm not sure if disabling liveliness probes is ideal from Cilium's point of view.

I'm going to pull in some folks in your linked CFP for further discussion. Lets go over this in that issue.

ldelossa avatar May 29 '24 14:05 ldelossa

I think this solution makes sense. We have discussed about this issue in the APAC community meeting and concluded that killing the whole Cilium agent on k8s disconnection is too aggressive. We might want to kill the certain subsystem, but maybe not entire agent.

For example, the real impact of k8s disconnection for BGP Control Plane is the user's configuration change may not be propagated in timely manner. BGP works without any problem otherwise. I'd assume most of the time, users want to avoid connectivity loss instead of configuration delay.

We might have a subsystem which this behavior is desirable, so opt-in might be a safer option. However, in my personal opinion, we can just completely remove k8s connectivity from health criteria and fix such a subsystem to stop individually.

Let me CC @joestringer as a person who was involved in the original discussion and also as a core maintainer.

YutaroHayakawa avatar May 29 '24 15:05 YutaroHayakawa

I've been thinking about this PR too :sweat_smile:

AFAIK controller that updates connectivity to apiserver is still running here: https://github.com/cilium/cilium/blob/0001f4b3fcf950ba6a3bb6fb0cba14931b340ce3/daemon/cmd/status.go#L804

So I would at least expect cilium-dbg status to show whether connectivity to k8s works, but a metric about the general status of Cilium connectivity to control-plane (K8s control plane / kvstore) would be nice too.

FYI, there is also a controller that is supposed to close all connections if the health check of the k8s apiserver fails: https://github.com/cilium/cilium/blob/c32980607c16cf6daddea300f8318e217e31bc34/pkg/k8s/client/cell.go#L290 so if this part works, there is also that fail-safe mechanism.

I'm not sure if there is a specific argument to keep kube-apiserver connectivity as part of the base health check.

There was some bug in the past with connectivity to kube-apiserver, that was only resolved with the restart of the agent: https://github.com/cilium/cilium/issues/20915 So while I also lean towards enabling it by default too, we still definitely need an option to retain previous behavior, just in case, for a transition period.

CC @hemanthmalla we talked about it before a little bit, I am guessing you might be interested in this too and probably have some thoughts as well. I guess you might want something similar for kvstore so potentially we might want to make it a bit more generic.

marseel avatar Jun 05 '24 09:06 marseel

Isn't one of the assumptions users can make about the restart behavior of the cilium-agent is that connectivity will continue to work even as the cilium-agent is killed or restarted? I'm confused by the comments here because they assume there is a coupling between cilium agent availability and Pod network connectivity. Are we referring to "connectivity loss" as connectivity that is out of sync with user configured state in the k8s apiserver?

learnitall avatar Jun 14 '24 17:06 learnitall

Isn't one of the assumptions users can make about the restart behavior of the cilium-agent is that connectivity will continue to work even as the cilium-agent is killed or restarted? I'm confused by the comments here because they assume there is a coupling between cilium agent availability and Pod network connectivity. Are we referring to "connectivity loss" as connectivity that is out of sync with user configured state in the k8s apiserver?

The issue is bgp advertisement. If cilium agent is offline when kube-api is unavailable, cilium also can't advertise routes to bgp peers so this effectively means kube-api availability now has impact on loadbalancer availability, for which there is no good reason.

rouke-broersma avatar Jun 14 '24 17:06 rouke-broersma

Isn't one of the assumptions users can make about the restart behavior of the cilium-agent is that connectivity will continue to work even as the cilium-agent is killed or restarted?

The other aspect I'll point out is that this PR suggests that if the kube-apiserver goes down, that will force every cilium-agent instance in the cluster to also go down. Similar to how you're asserting that the dataplane should continue without the local control plane, I think it makes sense for the local control plane to continue without the central control plane.

joestringer avatar Jun 19 '24 01:06 joestringer

What's the next step on this PR? I see a few comments suggesting to change some things. @tkna are you still working on this? Do you have any questions for reviewers?

joestringer avatar Jul 18 '24 02:07 joestringer

@joestringer @marseel @YutaroHayakawa @ysksuzuki @ldelossa @learnitall

I made some minor corrections. As far as I know, the remaining discussions are as follows:

  1. whether --require-k8s-connectivity should be implemented as an API parameter or a daemon config I just implemented it as an API parameter for now. I think implementing it as an API parameter is more flexible and easy to apply/rollback since it is configured by HTTP header and we can change it per request.

  2. whether --require-k8s-connectivity should be false or true by default I just implemented it as default:false for now, but the discussion is still open.

So while I also lean towards enabling it by default too, we still definitely need an option to retain previous behavior, just in case, for a transition period.

In my understanding, in @marseel's opinion --require-k8s-connevtivity should be true by default at this point.

  1. whether we should generalize this mechanism (to exclude kvstore probe etc.) I don't think the generalization is necessary unless there is a requirement.

  2. whether we should introduce a new metric As @marseel mentioned, connectivity to kube-apiserver can be checked by cilium-dbg status and logs from "k8s-heartbeat" mechanism, so I don't think the new metric is necessary at least for this PR. (can be another PR)

tkna avatar Jul 18 '24 05:07 tkna

Thanks @tkna for the detailed summary.

whether --require-k8s-connectivity should be false or true by default I just implemented it as default:false for now, but the discussion is still open.

I think long-term it should be true by default, but it would require significantly more testing. I'm fine with having it as false by default for now and then following up with making it true.

marseel avatar Jul 23 '24 11:07 marseel

@marseel Just for clarification, --require-k8s-connectivity: true retains the existing behavior, and false makes cilium-dbg status(and livenessProbe) independent from the k8s probe.

Do you mean: long-term: default: false (make cilium-dbg status independent from the k8s probe) for now: default: true (retaining existing behavior) ?

tkna avatar Jul 24 '24 05:07 tkna

yea, sorry I wasn't clear. I meant long-term k8s health-check would be disabled by default, but for now let's keep current behaviour.

marseel avatar Jul 24 '24 08:07 marseel

@marseel Fixed to keep the existing behavior by default. Considering making a boolean option for cilium-dbg status, I changed the option's name from require-k8s-connectivity to without-k8s-connectivity to make it false by default.

If this PR seems to be OK, could you trigger E2E tests?

tkna avatar Jul 26 '24 05:07 tkna

/test

ovidiutirla avatar Jul 26 '24 09:07 ovidiutirla

@ovidiutirla Some E2E tests seemed to be failing for reasons unrelated to this PR. I just rebased it, so could you run the E2E tests again?

tkna avatar Jul 31 '24 02:07 tkna

/test

YutaroHayakawa avatar Jul 31 '24 03:07 YutaroHayakawa

Looks like @tkna has made all the requested changes as agreed on here, is there anything else that we need to do to keep this moving? @joestringer, @YutaroHayakawa, @marseel?

youngnick avatar Aug 12 '24 07:08 youngnick

/test

ovidiutirla avatar Aug 22 '24 07:08 ovidiutirla

@ldelossa Is there anything else I should do about this PR?

tkna avatar Sep 03 '24 10:09 tkna

/test

YutaroHayakawa avatar Sep 25 '24 10:09 YutaroHayakawa

/ci-ginkgo

joestringer avatar Sep 25 '24 17:09 joestringer

/ci-runtime

joestringer avatar Sep 25 '24 17:09 joestringer

/ci-integration

joestringer avatar Sep 25 '24 17:09 joestringer

Thanks for the contribution @tkna !

joestringer avatar Sep 26 '24 22:09 joestringer

@joestringer Do you think this comment should be modified since it is out of date? https://github.com/cilium/cilium/pull/32724#issue-2318459880

withoutK8sConnectivity

tkna avatar Sep 27 '24 01:09 tkna

@tkna yes, that would be helpful. When this PR becomes part of a release, the release notes will point back to this PR. If a user is interested in the feature, they may open this PR and read the description.

joestringer avatar Sep 27 '24 15:09 joestringer

@joestringer https://github.com/cilium/cilium/pull/32724#issue-2318459880 modified to requireK8sConnectivity.

tkna avatar Sep 30 '24 01:09 tkna