kubespray icon indicating copy to clipboard operation
kubespray copied to clipboard

cluster installed with Kubespray cannot be restarted

Open julienlau opened this issue 3 years ago • 3 comments

Environment:

  • Cloud provider or hardware configuration: Ubuntu20.04 VMs KVM hypervisor

Kubespray version (commit) (git rev-parse --short HEAD): commit c24a3a3b152d41f88bd48c9e6f24fd132fd4a78a (HEAD -> master, origin/master, origin/HEAD)

Network plugin used: calico core_dns

Full inventory with variables (ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"):

all:
  vars:
    timeout: 20
    become: no
    #become_user: jlu
    #become_method: su
    #become_exe: sudo su -
  hosts:
    k8s-master-1:
      ansible_host: u20-1
      ip: 192.168.122.61
      access_ip: 192.168.122.61
    k8s-node-1:
      ansible_host: u20-2
      ip: 192.168.122.62
      access_ip: 192.168.122.62
    k8s-node-2:
      ansible_host: u20-3
      ip: 192.168.122.63
      access_ip: 192.168.122.63
    k8s-node-3:
      ansible_host: u20-4
      ip: 192.168.122.64
      access_ip: 192.168.122.64
    k8s-node-4:
      ansible_host: u20-5
      ip: 192.168.122.65
      access_ip: 192.168.122.65
  children:
    kube_control_plane:
      hosts:
        k8s-master-1:
    kube_node:
      hosts:
        k8s-node-1:
        k8s-node-2:
        # k8s-node-3:
        # k8s-node-4:
    etcd:
      hosts:
        k8s-master-1:
    k8s_cluster:
      children:
        kube_control_plane:
        kube_node:
    calico_rr:
      hosts: {}

k8s-cluster:
  vars:
    dns_min_replicas: 1
    docker_version: latest
    calico_version: "v3.16.4"

Command used to invoke ansible: ansible-playbook -T 20 -i inventory/local/hosts.yml cluster.yml -v -e container_manager=docker

Output of ansible run: installation is OK.

However I cannot restart the cluster even by doing a rolling restart :

Please see https://github.com/kubernetes-sigs/kubespray/issues/8850

  • service cri-dockerd.service is not enabled and should be
  • cgroupv2 are not enabled and should be
  • coredns does not restart due to incompatibility with systemd-resolved -> systemd-resolved should be disabled

julienlau avatar Aug 01 '22 17:08 julienlau

Thanks for the reply, but I mentionned 3 elements preventing successful cluster reboot and the issue was marked as solved despite only one element being adressed.

What about the 2 others ? coredns does not restart !

julienlau avatar Aug 29 '22 12:08 julienlau

/reopen

julienlau avatar Aug 29 '22 12:08 julienlau

@julienlau: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Aug 29 '22 12:08 k8s-ci-robot

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Nov 27 '22 12:11 k8s-triage-robot

/remove-lifecycle stale

olivierlemasle avatar Nov 27 '22 17:11 olivierlemasle

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Feb 25 '23 18:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Mar 27 '23 18:03 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Apr 26 '23 18:04 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Apr 26 '23 18:04 k8s-ci-robot