kubeadm upgrade apply phases
FEATURE REQUEST
kubeadm init recently included a "--skip-phases" option for skipping a particular phase. However, kubeadm upgrade does not appear to have this option. This specifically causes issues when doing a kubeadm upgrade on a cluster that uses kube-router to replace the functionality of kube-proxy. Since upgrade will always redeploy kube-proxy, it will immediately begin altering iptables and/or ipvs underneath kube-router, and this can cause outages if the daemonset isn't deleted quickly enough.
Additionally (please ignore my ignorance if this is the case, as this is not the main issue) I should mention the documentation is not clear whether it's even possible to skip a particular addon. It appears --skip-phases addons will skip coredns as well, which would be undesireable.
Versions
1.13.0
(EDIT by neolit123:)
SUMMARY:
- we want to break the command "kubeadm upgrade apply" into sub-commands (also called "phases")
- kubeadm supports a phase runner that is already used in commands like "kubeadm init"
- this requires breaking the command process is a data structure and a number of phases
- the phases are defined under cmd/phases
- "kubeadm upgrade apply" needs to be adapted to the same approach as "kubeadm init".
- currently the upgrade code requires cleanup before such a change can be applied to it.
- there are some closed attempts of cleaning up the upgrade code: https://github.com/kubernetes/kubernetes/pull/87715 https://github.com/kubernetes/kubernetes/pull/87636 https://github.com/kubernetes/kubernetes/pull/87638
- once the cleanup is done creating the phases should be relatively simple.
proposal: https://github.com/kubernetes/enhancements/pull/1298 https://github.com/kubernetes/enhancements/tree/master/keps/sig-cluster-lifecycle/kubeadm/2501-kubeadm-phases-to-beta
"apply" phase naming and other ideas: https://docs.google.com/document/d/1Nicy27rt9jm9tOzZ_MEcQsnmyOFl2ZB6k110aRblv2M/edit?usp=sharing
TODO:
- [x] remove the change added in this PR, once phases are supported: https://github.com/kubernetes/kubernetes/pull/89593
- [ ] there are a number of different tasks in the control plane phase PerformStaticPodUpgrade function, we should consider expose subphases for
kubeadm upgrade node/apply phase control-plane - [x] add
addonandpost-upgradephases forkubeadm upgrade nodecommand https://github.com/kubernetes/kubernetes/pull/127242
kubeadm init recently included a "--skip-phases" option for skipping a particular phase. However, kubeadm upgrade does not appear to have this option. This specifically causes issues when doing a kubeadm upgrade on a cluster that uses kube-router to replace the functionality of kube-proxy. Since upgrade will always redeploy kube-proxy, it will immediately begin altering iptables and/or ipvs underneath kube-router, and this can cause outages if the daemonset isn't deleted quickly enough.
yes, there is currently now way to skip phases during upgrade. kube-proxy is considered an "essential addon" and kubeadm assumes that all clusters would use it. also the daemon set is created in memory and passed to the client directly. it will not be written to disk.
this feature needs discussion.
/assign @timothysc @fabriziopandini
Additionally (please ignore my ignorance if this is the case, as this is not the main issue) I should mention the documentation is not clear whether it's even possible to skip a particular addon. It appears --skip-phases addons will skip coredns as well, which would be undesireable.
you can skip both and then apply the dns addon manually.
Thanks for taking a look. Also, I think if I had a choice, phase control during an upgrade would be handled through the config file/kubeadm-config configmap so as to be something inherent to the existing cluster, rather than a flag that has to be remembered at upgrade time.
@MarkDeckert you can skip only the proxy addon by using --skip-phases addons/proxy
First considerations on this topic, that deserve discussion at sig-level and commitment for execution:
- "Officially supported variations" of Kuberenets cluster should be supported via options in the config file, like e.g. to the choice between kube-dns and coredns
- If the user chooses to go with phases and builds a "custom variation", it should get the tools/phases for making also join/upgrade workflows working, but combining/skipping those tools should be under his responsibility, because IMO it is practically impossible to have a reliable CI signal for the almost infinite number of combinations of phases.
Having the ability to skip kube-proxy re-installing during an upgrade is required. I am using kube-router to manage my services and network policies. When running a kubeadm upgrade once kube-proxy is reinstalled the cluster is unusable until a kube-proxy --cleanup is ran. This leads to downtime of services running on the cluster.
Let's discuss during the next call.
I do think we should allow folks to skip but the SLO's from kubeadm side are gone then b/c of the permutations.
There is a hackish workaround for kube-router users:
Create a kube-proxy daemonset and use nodeAffinity to ensure it cannot be scheduled on any node:
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kube-proxy
namespace: kube-system
labels:
k8s-app: kube-proxy
purpose: upgrade-placeholder
spec:
minReadySeconds: 10
revisionHistoryLimit: 1
updateStrategy:
type: RollingUpdate
selector:
matchLabels:
k8s-app: kube-proxy
purpose: upgrade-placeholder
template:
metadata:
labels:
k8s-app: kube-proxy
purpose: upgrade-placeholder
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kube-proxy
operator: In
values:
- CantSchedule
containers:
- name: kube-proxy
image: busybox:1.30
command:
- "/bin/sleep"
- "300"
resources:
requests:
cpu: 10m
memory: 10Mi
During the upgrade of the first Kubernetes master node, it will fail on the final post-upgrade task in which the kube-proxy daemonset is updated with this error:
[upgrade/postupgrade] FATAL post-upgrade error: unable to update daemonset: DaemonSet.apps "kube-proxy" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"k8s-app":"kube-proxy"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable
If you are using automation to perform the upgrade, it will need to cope with this error.
Upgrading additional Kubernetes master nodes with kubeadm upgrade node experimental-control-plane requires that the kube-proxy ConfigMap exist to avoid errors:
---
apiVersion: v1
kind: ConfigMap
metadata:
name: kube-proxy
namespace: kube-system
labels:
app: kube-proxy
data:
config.conf: |-
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
bindAddress: 0.0.0.0
clientConnection:
acceptContentTypes: ""
burst: 10
contentType: application/vnd.kubernetes.protobuf
kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
qps: 5
clusterCIDR: 10.0.0.0/16
configSyncPeriod: 15m0s
conntrack:
max: null
maxPerCore: 32768
min: 131072
tcpCloseWaitTimeout: 1h0m0s
tcpEstablishedTimeout: 24h0m0s
enableProfiling: false
healthzBindAddress: 0.0.0.0:10256
hostnameOverride: ""
iptables:
masqueradeAll: false
masqueradeBit: 14
minSyncPeriod: 0s
syncPeriod: 30s
ipvs:
excludeCIDRs: null
minSyncPeriod: 0s
scheduler: ""
syncPeriod: 30s
metricsBindAddress: 127.0.0.1:10249
mode: ""
nodePortAddresses: null
oomScoreAdj: -999
portRange: ""
resourceContainer: /kube-proxy
udpIdleTimeout: 250ms
kubeconfig.conf: |-
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
server: "https://URL-of-KUBEMASTERS"
name: default
contexts:
- context:
cluster: default
namespace: default
user: default
name: default
current-context: default
users:
- name: default
user:
tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
We should promote upgrade to phase to allow the --skip-phases flag.
it's a matter of bandwidth for the release cycle and who can work on it. currently we are fixing some aspects of upgrades related to certs renewal, so it might be a better idea to leave it for the next cycle so that we don't have conflicting PRs. cc @fabriziopandini for an opinion.
I'll work on reset phases this week, probably I'll be able to work on reset phase next week.
Current plan is to re-evaluate execution of this item in the 1.16 timeframe.
Maybe another use case for upgrade phases with immutable infrastructure to consider: https://github.com/kubernetes/kubeadm/issues/1511#issuecomment-486103732
Rif https://github.com/kubernetes/kubeadm/issues/1658, we should ensure that kubeadm-upgrade phases should allow atomic upgrade of etcd
given the kubeadm upgrade apply phases probably need a KEP and given the deadline for KEPs for 1.16 has passed, punting to 1.17.
We need to reevaluate for 1.17 and get things going here. The proper fix of #1756 depends on this. /assign
/assign @Klaven
@neolit123 @rosti here is the initial document i'm working on. will continue to work on it. I will at some point try to draw a dependency graph out for each phase.
https://docs.google.com/document/d/1Nicy27rt9jm9tOzZ_MEcQsnmyOFl2ZB6k110aRblv2M/edit?usp=sharing
please feel free to update the document as you need/want.
thanks @Klaven i will have a look. we should probably comment on the doc and get agreement / voting on the phase separation during a kubeadm office hours.
shifted the milestone to 1.18.
yes. sorry this did not make 1.17, was not a lot of time. sorry. I have started work on it though and should see PR's for it soon.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
/remove-lifecycle stale
How comes a simple issue like this can not be solved for more then like 2 years? The proposals in this doc make sense, not?
kube-proxy is a addon. If I don't want it I should be able to specify that in a config file. Problem solved. Why is it so complicated to get this done?
it was scheduled for 1.18, but i assume @Klaven who signed up to help with this ran out of bandwidth.
the phase implementation itself is fairly straight forward, unfortunately the upgrade apply command has a lot of technical depth that had to be resolved first.
kube-proxy is a addon. If I don't want it I should be able to specify that in a config file. Problem solved. Why is it so complicated to get this done?
the optional kube-proxy installation cannot be part of the kubeadm config as long-term the kubeadm config should not know about the kube-proxy addon. this forced us to delegate the optional upgrade of kube-proxy to phases.
@asteven If you need a simple fix I put the following into my kubeadm config
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
clientConnection:
kubeconfig: invalid-kubeconfig.conf
This makes kube-proxy crash and not actually mess with anything. So a simple delete of the daemonset is all that is needed after an upgrade.
It's not pretty but it's worked for me.
it was scheduled for 1.18, but i assume @Klaven who signed up to help with this ran out of bandwidth.
the phase implementation itself is fairly straight forward, unfortunately the
upgrade applycommand has a lot of technical depth that had to be resolved first.kube-proxy is a addon. If I don't want it I should be able to specify that in a config file. Problem solved. Why is it so complicated to get this done?
the optional kube-proxy installation cannot be part of the kubeadm config as long-term the kubeadm config should not know about the kube-proxy addon. this forced us to delegate the optional upgrade of kube-proxy to phases.
Not sure why kubeadm should not be allowed to know about kube-proxy. If it installed it during kubeadm init, then kubeadm upgrade should be able to take what init did and use that as the base to decide what 'upgrade' means. Not? Either this or kubeadm upgrade has to inspect what's there and only upgrade that. Anything else seems to be fundamentally broken.
Isn't this the same pattern that most things in kubernetes work with? Why not kubeadm?
@asteven If you need a simple fix I put the following into my kubeadm config
apiVersion: kubeproxy.config.k8s.io/v1alpha1 kind: KubeProxyConfiguration clientConnection: kubeconfig: invalid-kubeconfig.confThis makes kube-proxy crash and not actually mess with anything. So a simple delete of the daemonset is all that is needed after an upgrade.
It's not pretty but it's worked for me.
Thanks, will try this.
Not sure why kubeadm should not be allowed to know about kube-proxy. If it installed it during kubeadm init, then kubeadm upgrade should be able to take what init did and use that as the base to decide what 'upgrade' means. Not?
with some of the changes around CNI implementing their own proxy, kube-proxy one day might not be deployed by default by kubeadm. it is not desired to add a related field for that as part of the API if the API has to later change.
for phases we have more flexibility.
Either this or kubeadm upgrade has to inspect what's there and only upgrade that. Anything else seems to be fundamentally broken.
this was proposed, but rejected as the upgrade apply phase solution would have explicitly allowed the user to skip the re-deployment of kube-proxy on upgrade.
@neolit123 Thanks for taking the time to answer. I think now I understand why it's taking so long ;-) I have my workaround, moving on with that.
this was proposed, but rejected as the upgrade apply phase solution would have explicitly allowed the user to skip the re-deployment of kube-proxy on upgrade.
PR for that: https://github.com/kubernetes/kubernetes/pull/89593
because we don't know when the phases will come around, this will be added to 1.19.0, but cannot be backported..
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale