kubespray
kubespray copied to clipboard
containerd is not upgraded because it is not restarted
Environment:
-
Version of Ansible (
ansible --version
): ansible==4.10.0 ansible-core==2.11.12 -
Version of Python (
python --version
): 2.7.5
Kubespray version (commit) (git rev-parse --short HEAD
):
v2.19.0
Command used to invoke ansible: ansible-playbook upgrade-cluster.yml
Anything else do we need to know:
On a cluster deployed using kubespray-v2.18.0, after running ansible-playbook upgrade-cluster.yml
using kubespray-v2.19.0, containerd is still using old version as shown in kubectl get nodes -o wide
. After restarting the containerd service, new version is used.
The problem is that the handler notify: restart containerd
is never ran, despite that is notified multiple times in https://github.com/kubernetes-sigs/kubespray/blob/v2.19.0/roles/container-engine/containerd/tasks/main.yml.
After some troubleshooting, the problem should be caused by the import_role
in https://github.com/kubernetes-sigs/kubespray/blob/v2.19.0/roles/container-engine/validate-container-engine/tasks/main.yml#L90, because after I updated it to include_role
, the problem is gone.
My GUESS is that the when
conditions are added to the containerd handlers when import_role
in https://github.com/kubernetes-sigs/kubespray/blob/v2.19.0/roles/container-engine/validate-container-engine/tasks/main.yml#L90, and since my upgrade doesn't match those conditions (e.g. container_manager != "containerd"
), the handler is not ran... What a confusing ansible behaviour...
Also, if you just run ansible-playbook cluster.yml
on a fresh node, you will see the handler is never ran as well, but containerd will still be running because of https://github.com/kubernetes-sigs/kubespray/blob/v2.19.0/roles/container-engine/containerd/tasks/main.yml#L100. I tried that with ansible==5.7.1 ansible-core==2.12.6 python 3.10, and the problem happens as well. So the problem doesn't look like an old ansible behaviour.
i also had this problem today. Even then i change containerd preferences and run cluster.yml --tags=containerd , config changes on nodes but containerd wasnt been restarted. I have to restart it manually to take effect of new options in config.
Can confirm by using debugger: always
on handler, handler is skipped because it was loaded before with a false condition:
RUNNING HANDLER [container-engine/containerd : restart containerd] *************************************************************************************************************************************************************************************************************
Friday 23 September 2022 16:42:37 +0200 (0:00:00.181) 0:01:18.666 ******
skipping: [node-2.company.com]
[node-2.company.com] HANDLER: container-engine/containerd : restart containerd (debug)> p task.when
['not (is_ostree or (ansible_distribution == "Flatcar Container Linux by '
'Kinvolk") or (ansible_distribution == "Flatcar"))',
'container_manager != "containerd"',
'docker_installed.matched == 0',
'containerd_installed.matched > 0',
"ansible_facts.services[service_name]['state'] == 'running'"]
[node-2.company.com] HANDLER: container-engine/containerd : restart containerd (debug)> User interrupted execution
My GUESS is that the when conditions are added to the containerd handlers when import_role
That is exactly how import_* works. (See also https://github.com/kubernetes-sigs/kubespray/issues/9279 , https://serverfault.com/questions/875247/whats-the-difference-between-include-tasks-and-import-tasks )
since my upgrade doesn't match those conditions (e.g. container_manager != "containerd"
That part seems less clear, why is container_manager != "containerd" ? Since https://github.com/kubernetes-sigs/kubespray/pull/8439/ was added it looks like this would cause containerd to be uninstalled if container_manager != "containerd".
In any case the last handler of a given name that is invoked should override earlier ones, so the handlers invoked normally in https://github.com/kubernetes-sigs/kubespray/blob/v2.19.0/roles/container-engine/containerd/tasks/main.yml should override this one: https://github.com/kubernetes-sigs/kubespray/blob/v2.19.0/roles/container-engine/validate-container-engine/tasks/main.yml#L90
unless there is an ansible bug...
It's also slightly odd that the restart containerd handler is a no-op that just calls another handler:
https://github.com/kubernetes-sigs/kubespray/blob/master/roles/container-engine/containerd/handlers/main.yml
Probably the other handlers should listen
instead.
It should work though but might be part of the corner case that could explain why this might be hitting some obscure bug (I looked through ansible issues but didn't find one that seemed pertinent to this).
"the location where it is inserted doesn't affect when the handlers are added" https://github.com/ansible/ansible/issues/78871#issuecomment-1258170633