kubespray
kubespray copied to clipboard
Release Proposal v2.20
Deprecation / Removal
- Drop Ansible support for v2.9 and v2.10 (#8925, @oomichi)
- Drop support for Fedora 34 (#8967, @floryut)
Feature / Major changes
- Add Rocky Linux 8 support (#8905, @oomichi)
- Add Kylin Linux support. (#9078, @ErikJiang)
- Add Fedora36 support (#8967, @floryut)
- Add 'flush ip6tables' task in reset role (#9168, @GreatLazyMan)
- Add
tarin common required package (#9184, @yankay) - Add support for NTP configuration. (#9027, @yankay)
- Increase ansible fact_caching_timeout (from 2 to 24 hours) (#9059, @rptaylor)
- Add kubelet systemd service hardening option
kubelet_systemd_hardening: [true|false](#9194, @alegrey91) - Support timezone setting (#9263, @yankay)
- Update deprecated ansible include syntax (#9040, @boeto)
- Update etcd download url in offline.yml to use arch (#8943, @ErikJiang)
- Add Support for Rewrite Plugin to CoreDNS/NodelocalDNS (#9245, @eifelmicha)
- Add
SeccompDefaultadmission plugin for kubelet (using new variablekubelet_seccomp_default) (#9074, @alegrey91) - Add an optional extra_groups parameter for k8s_nodes (e.g. to configure calico route reflector nodes on Openstack using the calico_rr group) (#9211, @rptaylor)
- Add arm64 Flatcar OS's pypy bootstrapping support (#8959, @kerryeon) (see Notes 1)
- Add docker support for Kylin distributions (#9144, @ErikJiang)
- Add hashes for Kubernetes 1.24.3 , v1.22.12, v1.23.9 (#9092, @marcofortina)
- Add ingress nginx webhook (#9033, @liupeng0518)
- Add manage-offline-files.sh to collect necessary files and provides http file download service for offline deployment. (#8956, @ErikJiang)
- Add missing configuration for extra tolerations (#8908, @smasset)
- Add support for node & pod pid limits (in kubelet-config file) (#9038, @h9-HSFRQDH)
- Add the option to enable default Pod Security Configuration (#9017, @Foxlik)
- Add unsafe_show_logs switch to show more log details (default to false, same as previous behavior) (#9164, @ErikJiang)
- Add variables (
delete_node_retries,delete_node_delay_seconds) to tweak remove node process (#9096, @ydFu) - Added 'avoid-buggy-ips' support of MetalLB (
metallb_avoid_buggy_ipsfor default IP address pool andavoid_buggy_ipsfor additional IP address pools defined inmetallb_additional_address_pools) (#9166, @kerryeon) (see Notes 2) - Adjust the default value of calico blockSize ipv4 to 26, and ipv6 to 122. (#9055, @cyclinder)
- Make kubernetes owner parametrized (using
kube_owner/kube_cert_group/etcd_ownervariables) (#8952, @alegrey91) - Move old etcd backup removal after etcd restart, to prevent removing backup if etcd fail (#9147, @emiran-orange)
- Supports reserve ephemeral-storage (#8895, @Thearas)
- [dev/docs] add support for pre-commit hook (#9158, @cristicalin)
- [etcd] Etcd role won't run on all nodes everytime. (#9173, @liupeng0518)
- [etcd] add 3.5.4 and drop 3.5.1 and 3.5.2 (#9021, @cristicalin)
- [infra] bump pause container to 3.6 (#9024, @cristicalin)
- Update Kubernetes dashboard to 2.6.0 (k8s 1.24 support) (#8906, @floryut)
- [kubernetes] make 1.24.x the new default (#8935, @cristicalin)
- [kubernetes] drop support for 1.21.x (#8935, @cristicalin)
- [kubernetes] drop support for deprecated dynamic_kubelet_configuration (#8935, @cristicalin)
- [offline] Archive offline-files and env NO_HTTP_SERVER to skip Nginx container running. (#9068, @yjqg6666)
- Adds support for multiple architectures to yq (#9288, @ErmalKristo)
- Add variable to tweak the vsphere-csi namespace (
vsphere_csi_namespace) (#9278, @MahdiAbbasi95) - Ensure ping package is installed on the system (#9284, @yankay)
Network
- [Calico] calico rr now supports multiple groups (#9134, @liupeng0518)
- [Calico] drop support for 3.19.x and 3.20.x
- [Calico] Make Calico CNI log path configurable and allow disabling this log (#8921, @fungusakafungus)
- [Calico] The NAT (
nat_outgoing) would not be disabled automatically when enablingpeer_with_router. (#9255, @kerryeon) - [Calico] The variable calcio_ipam_autoallocateblocks has been renamed to calico_ipam_autoallocateblocks (#9056, @liupeng0518)
- [Calico] calico-typha metrics port are now exposed when metrics are enabled (#8855, @vjacynycz)
- [Calico] Add Wireguard support for Rocky Linux 9 (#9287, @krystianmlynek)
- [Canal] update templates to work again with both etcd and k8s datastore (#9113, @floryut)
- [Cilium] Add list/watch nodes rules to cilium-operator clusterrole. (#9178, @Thearas)
- [Cilium] Add support for the updated (startup|liveness|readiness)Probe.Port numbers (#9031, @tomberget)
- [Cilium] Update cilium to v1.11.7 (#9119, @dkhachyan)
- [Cilium] Make rolling-restart readiness wait delay and count configurable via
cilium_rolling_restart_wait_retries_{count, delay_seconds}(#9176, @Tristan971) - [Cilium] Upgrades cilium to 1.11.6 and add some default variables. (#9065, @eminaktas) (See Notes 3)
- [Cilium] Update Cilium default to 1.12.x (#9225, @necatican) (See Notes 5)
- [Cilium] Dropped support for < v1.10.0 (#9225, @necatican)
- [Cilium]
cilium_ip_masq_agent_enablevariable no longer exists. Useenable-ipv4-masqueradeandenable-ipv4-masqueradeto enable masquerade. (#9225, @necatican) - [flannel] update to v1.18.1 & make it default (#9104, @mzaian)
- [flannel] update to v1.19.2 & make it default (#9296, @mzaian)
- [Kube-vip] Fail if
kube_proxy_strict_arpis set tofalsein arp mode (#9223, @yankay) - [Multus] Support multi-architecture installation (#9012, @cyclinder)
Applications
- [Openstack] Add option to use default deny firewall policy and port allowlisting on UpCloud (#9058, @Ajarmar)
- [Openstack] Fix subnet order and number of master nodes (#9159, @robinelastisys)
- [Metallb] Renamed
matallb_auto_assignvariable tometallb_auto_assign(users disabling 'auto-assign' in metallb must update the variable name) (#8949, @orange-llajeanne) - [vSphere-csi] Add nodeAffinity to daemonset using
vsphere_csi_node_affinityvariable (#9293, @dmitrytretyakov)
Container-Managers
- [containerd] add hashes for 1.5.12, 1.5.13, 1.6.5 and 1.6.6, make 1.6.6 the new default (#8980, @cristicalin)
- [containerd] Add LimitMEMLOCK parameter configuration in containerd.service (using
containerd_limit_[proc_num/core/open_file_num/mem_lock) (#9269, @ErikJiang) - [containerd] Remove duplication in containerd template (#9301, @fungusakafungus)
- [Docker] use cri-dockerd instead of dockershim by default
- [Docker] Enable cri-dockerd service to prevent issue with reboot (#9201, @mostafaghadimi)
- [cri-o] Add dpkg hold for apt installs (#9075, @SamuelBECK1)
- [cri-o] add support for 1.24.x required by kubernetes 1.24.x (#8935, @cristicalin)
- [runc] update versions for 1.1.x and drop 1.0.x (#9022, @cristicalin)
- [crun] add 1.4.5 and drop 1.2 and 1.3 (#9023, @cristicalin)
- [nerdctl] upgrade to 0.20.0 (#8980, @cristicalin) then 0.22.2 (#9180, @panpan0000)
Bug or Regression
- Fix failure to look up user etcd when adding a user (#9016, @yankay)
- Fixing setting up kubespray on Azure with CSI drivers. (#9153, @wayfrro)
- Add
--supervisor-fss-namespace=kube-systemflag to vcloud-csi installation (#9066, @yasintahaerol) - Add assertion for IPv4 check in verify settings (to allow IPv6 deployments) (#8946, @Citrullin)
- Add calico-kube-controllers missing verbs (#9032, @ghostloda)
- Allow "openSUSE Tumbleweed" to be run (again) (#9072, @oomichi)
- Apply calico bgp peer definition task to all nodes (#8974, @orange-llajeanne)
- Create snapshot namespace only when needed (#9014, @robinAwallace)
- Disable kubelet_authorization_mode_webhook by default (#9238, @cristicalin)
- Disabled DNSStubListener for Flatcar Linux (#9160, @kerryeon)
- Do not run etcd role in
scale.ymlplaybook when etcd installed by kubeadm (#9210, @LuckySB) - Fix Hetzner CCM cluster-cidr (wrongly set to a static value) (#9127, @ym)
- Fix calicoctl.sh path error when getting calico configuration (#9217, @tasekida)
- Fix failing tasks when calico_datastore is set to etcd (#9228, @chadswen)
- Fix missing quote in task "See if node is schedulable" (#9146, @emiran-orange)
- Fix number node name can't be added. (#9266, @cleverhu)
- Fix regex for replacing http_proxy host in RedHat Subscription Manager (#8957, @dicksontung)
- Fix some docker reset task (don't remove already uninstalled packages, ignore error on remove docker config files if already removed) (#8966, @orange-llajeanne)
- Fix the Centos/RHEL docker installation issue in ARM64 (#9047, @yankay)
- Fix the kube-vip missed SAN issue (#9099, @yankay)
- Fixed concatenate str & int in
auto_renew_certificates_systemd_calendar(#8979, @floryut) - Fixes the issue when it cannot correctly set the namespace for vphere-csi-driver (#9046, @eminaktas)
- Fixes vSphere CSI for vSphere CSI >= 2.4.0 on vSphere 6.7U3 (#8944, @snowball77)
- No more errors are emitted when attempting to delete worker nodes that do not exist. (#9244, @kerryeon)
- Optimize the format of evictionHard in kubelet-config.yaml template (#9204, @shelmingsong)
- Remove kubeowner different than root condition for user creation (#9125, @alegrey91)
- Remove unneeded socat wrapper installation for Flatcar (#8970, @kerryeon) (See Notes 4)
- Set fallback value of kubelet ip6 (#8926, @kerryeon)
- Swap calico download url, as the old primary url was deprecated and artefact no longer published (#8920, @sathieu)
- Upgrade the nginx-proxy and haproxy image version , and use the alpine base image (#9100, @yankay)
- Variable
kube_pid_reservedmust be a string (#9124, @liupeng0518) - [Docker] Add restart of docker.service during install (#9205, @krystianmlynek)
- [Kube-ovn] Value check for
HW_OFFLOADis now correctly handle (and will no longer always be false) (#9218, @floryut) - [ingress-nginx] Fix ingress-nginx RBAC rules when deployed classless (#9156, @cristicalin)
- Remove the 'etcd-unsupported-arch' args to fix the etcd issue in arm64 (#9049, @yankay)
- Fix duplicate field in ingress-nginx template (#9285, @cloud-66)
- Fix ETCD memory leak issue by adding
max_concurrent=1000in the CoreDNS config. (#9307, @yankay)
Other (Cleanup or Flake)
- [CI] upgrade vagrant image for opensuse leap to 15.4 (#9175, @cristicalin)
- [CI] test upgrade with defaults (containerd) instead of docker (#8980, @cristicalin)
- [CI] Fix cloud_init files for different distros (#9232, @floryut)
Component versions:
- Core
- Kubernetes v1.24.6
- Etcd v3.5.4
- Docker v20.10
- Containerd v1.6.8
- CRI-O v1.24
- Network
- CNI-plugins v1.1.1
- Calico v3.23.3
- Cilium v1.12.1
- Flannel v0.19.2
- Kube-ovn v1.9.7
- Kube-Router v1.5.1
- Multus v3.8
- Weave v2.8.1
- kube-vip v0.4.2
- App
- Cert-manager v1.9.1
- CoreDNS v1.8.6
- Nginx-ingress v1.3.1
- krew v0.4.3
- argocd v2.4.12
- helm v3.9.4
- metallb v0.12.1
- registry v2.8.1
Known issues
- Host network might broke when an interface goes down (Cilium 1.12/Ubuntu 22.04), please read Note 5.
- If
bin_dirvalue is changed to something other than/usr/local/bin, containerd configuration might need to be tweak, please check #9243
Notes
- Upgrading the bootstrap pypy may cause some unexpected behaviors for
Flatcaruse-cases) - As the newly added feature uses the default value of MetalLB as same, there is no side effect for users who do not change it's value
- This PR also implements cgroup auto-mount. By default, it is enabled. You can disable it by adding
cgroup_auto_mount: false. Moreover, you can enable or disable BPF with these variables cilium_enable_bpf_masquerade and cilium_enable_host_legacy_routing - Some old (<2020Y) 'Flatcar Container Linux by Kinvolk' may not be supported.
- With Cilium 1.12/Ubuntu 22.04, you might run into this issue, workaround are available while the issue is resolved on cilium end.
Should systemd-networkd issue that was mentioned in #9225 be listed in known issues? What do you think?
Should
systemd-networkdissue that was mentioned in #9225 be listed in known issues? What do you think?
Thank you, you're right, I'll draft something up
Should #9243 be listed in known_issues as well? users need to be warned if bin_dir is different than /usr/local/bin then need to add below into config.toml through containerd_extra_args
[plugins."io.containerd.internal.v1.opt"]
path = "/opt/mypath"
@sohnaeo done, thank you 👍
Add variable to tweak the vsphere-csi namespace (vsphere_csi_namespace) (https://github.com/kubernetes-sigs/kubespray/pull/9278, @MahdiAbbasi95)
is more dangerous than it sounds, new var defaults to new value, vmware-system-csi, and there's no code to remove vsphere csi driver from kube-system namespace, so if someone used it before, they'll either need to remove csi driver from kube-system or set vsphere_csi_namespace to kube-system
Thank you so much for preparing a new release @floryut ! The above release-note seems good for me.
+1
Hi @floryut Would be nice If we can have this in the release https://github.com/kubernetes-sigs/kubespray/pull/9302
Add variable to tweak the vsphere-csi namespace (vsphere_csi_namespace) (#9278, @MahdiAbbasi95)
is more dangerous than it sounds, new var defaults to new value,
vmware-system-csi, and there's no code to remove vsphere csi driver from kube-system namespace, so if someone used it before, they'll either need to remove csi driver fromkube-systemor set vsphere_csi_namespace tokube-system
Created https://github.com/kubernetes-sigs/kubespray/pull/9312 to address that
Hi @floryut Would be nice If we can have this in the release #9302
Approved, needs a reviewer to check this PR 👍
HI @floryut , I'm sorry for there is a typo:
Fix ETCD memory leak issue by adding max_concurrent=1000 in the CoreDNS config. (https://github.com/kubernetes-sigs/kubespray/pull/9307, @yankay) =>
Fix CoreDNS memory leak issue by adding max_concurrent=1000 in the CoreDNS config. (https://github.com/kubernetes-sigs/kubespray/pull/9307, @yankay) =>
Thanks.
HI @floryut , I'm sorry for there is a typo:
Fix ETCD memory leak issue by adding max_concurrent=1000 in the CoreDNS config. (#9307, @yankay) =>
Fix CoreDNS memory leak issue by adding max_concurrent=1000 in the CoreDNS config. (#9307, @yankay) =>
Thanks.
🤦 thank you
I could reproduce https://github.com/kubernetes-sigs/kubespray/issues/9019 and am working on a fix. It's mind-boggling.
I could reproduce #9019 and am working on a fix. It's mind-boggling.
That looks serious, would be good to have a fix in 2.20 otherwise clusters could still have an old containerd version from 2.18.
@floryut what about including https://github.com/kubernetes-sigs/kubespray/pull/9026 , https://github.com/kubernetes-sigs/kubespray/pull/9109?
@floryut what about including #9026 , #9109?
Don't forget to fill the Does this PR introduce a user-facing change? this is what generate the release notes (I try to edit them but sometime I can miss a few PR).
Anyway, I've added the PR in the release note, thank you.
I hope this calico typo correction commit will also be included in the next release!
- #9327
I hope this calico typo correction commit will also be included in the next release!
That makes the parameter name change and might be user-facing change.
I wrote a release-note on the pull request.
And done 🥳
Looks like upgrading a cluster to 2.20 fails due to the calico block size change: https://github.com/kubernetes-sigs/kubespray/pull/9055#issuecomment-1876155710