kubespray icon indicating copy to clipboard operation
kubespray copied to clipboard

containerd is not upgraded because it is not restarted

Open snowmansora opened this issue 2 years ago • 1 comments

Environment:

  • Version of Ansible (ansible --version): ansible==4.10.0 ansible-core==2.11.12
  • Version of Python (python --version): 2.7.5

Kubespray version (commit) (git rev-parse --short HEAD): v2.19.0

Command used to invoke ansible: ansible-playbook upgrade-cluster.yml

Anything else do we need to know: On a cluster deployed using kubespray-v2.18.0, after running ansible-playbook upgrade-cluster.yml using kubespray-v2.19.0, containerd is still using old version as shown in kubectl get nodes -o wide. After restarting the containerd service, new version is used.

The problem is that the handler notify: restart containerd is never ran, despite that is notified multiple times in https://github.com/kubernetes-sigs/kubespray/blob/v2.19.0/roles/container-engine/containerd/tasks/main.yml.

After some troubleshooting, the problem should be caused by the import_role in https://github.com/kubernetes-sigs/kubespray/blob/v2.19.0/roles/container-engine/validate-container-engine/tasks/main.yml#L90, because after I updated it to include_role, the problem is gone.

My GUESS is that the when conditions are added to the containerd handlers when import_role in https://github.com/kubernetes-sigs/kubespray/blob/v2.19.0/roles/container-engine/validate-container-engine/tasks/main.yml#L90, and since my upgrade doesn't match those conditions (e.g. container_manager != "containerd"), the handler is not ran... What a confusing ansible behaviour...

Also, if you just run ansible-playbook cluster.yml on a fresh node, you will see the handler is never ran as well, but containerd will still be running because of https://github.com/kubernetes-sigs/kubespray/blob/v2.19.0/roles/container-engine/containerd/tasks/main.yml#L100. I tried that with ansible==5.7.1 ansible-core==2.12.6 python 3.10, and the problem happens as well. So the problem doesn't look like an old ansible behaviour.

snowmansora avatar Jun 21 '22 18:06 snowmansora

i also had this problem today. Even then i change containerd preferences and run cluster.yml --tags=containerd , config changes on nodes but containerd wasnt been restarted. I have to restart it manually to take effect of new options in config.

cloud-66 avatar Aug 01 '22 17:08 cloud-66

Can confirm by using debugger: always on handler, handler is skipped because it was loaded before with a false condition:

RUNNING HANDLER [container-engine/containerd : restart containerd] *************************************************************************************************************************************************************************************************************
Friday 23 September 2022  16:42:37 +0200 (0:00:00.181)       0:01:18.666 ******
skipping: [node-2.company.com]
[node-2.company.com] HANDLER: container-engine/containerd : restart containerd (debug)> p task.when
['not (is_ostree or (ansible_distribution == "Flatcar Container Linux by '
 'Kinvolk") or (ansible_distribution == "Flatcar"))',
 'container_manager != "containerd"',
 'docker_installed.matched == 0',
 'containerd_installed.matched > 0',
 "ansible_facts.services[service_name]['state'] == 'running'"]
[node-2.company.com] HANDLER: container-engine/containerd : restart containerd (debug)> User interrupted execution

fungusakafungus avatar Sep 23 '22 15:09 fungusakafungus

My GUESS is that the when conditions are added to the containerd handlers when import_role

That is exactly how import_* works. (See also https://github.com/kubernetes-sigs/kubespray/issues/9279 , https://serverfault.com/questions/875247/whats-the-difference-between-include-tasks-and-import-tasks )

since my upgrade doesn't match those conditions (e.g. container_manager != "containerd"

That part seems less clear, why is container_manager != "containerd" ? Since https://github.com/kubernetes-sigs/kubespray/pull/8439/ was added it looks like this would cause containerd to be uninstalled if container_manager != "containerd".

In any case the last handler of a given name that is invoked should override earlier ones, so the handlers invoked normally in https://github.com/kubernetes-sigs/kubespray/blob/v2.19.0/roles/container-engine/containerd/tasks/main.yml should override this one: https://github.com/kubernetes-sigs/kubespray/blob/v2.19.0/roles/container-engine/validate-container-engine/tasks/main.yml#L90
unless there is an ansible bug...

rptaylor avatar Sep 23 '22 17:09 rptaylor

It's also slightly odd that the restart containerd handler is a no-op that just calls another handler: https://github.com/kubernetes-sigs/kubespray/blob/master/roles/container-engine/containerd/handlers/main.yml
Probably the other handlers should listen instead.

It should work though but might be part of the corner case that could explain why this might be hitting some obscure bug (I looked through ansible issues but didn't find one that seemed pertinent to this).

rptaylor avatar Sep 23 '22 18:09 rptaylor

"the location where it is inserted doesn't affect when the handlers are added" https://github.com/ansible/ansible/issues/78871#issuecomment-1258170633

rptaylor avatar Sep 26 '22 17:09 rptaylor