kubespray icon indicating copy to clipboard operation
kubespray copied to clipboard

WIP: Replace packet_debian9-macvlan with debian11

Open oomichi opened this issue 3 years ago • 13 comments

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

packet_debian9-macvlan is unstable job. In addition, debian9 is going to EOL soon as 1 So this replaces packet_debian9-macvlan with debian11 to avoid wasting our time for the job anymore.

Which issue(s) this PR fixes:

Fixes #8916

Does this PR introduce a user-facing change?:

Drop debian9 support from Kubespray because of the EOL

oomichi avatar Jun 13 '22 21:06 oomichi

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: oomichi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Jun 13 '22 21:06 k8s-ci-robot

https://github.com/kubernetes-sigs/kubespray/blob/889454f2bc35dbfc69da8824d2e8bc658ba08412/README.md?plain=1#L120 We should remove Jessie and Stretch from README to clearly state that debian 9 (and 8 lol..) are not supported anymore

floryut avatar Jun 14 '22 07:06 floryut

I think it's still worthwhile testing macvlan if we can get it to work but on more recent os like debian 11. WDYT?

cristicalin avatar Jun 14 '22 09:06 cristicalin

I think it's still worthwhile testing macvlan if we can get it to work but on more recent os like debian 11. WDYT?

Indeed, maybe move the job to debian11 to see what's happening (not hopeful but well 😆)

floryut avatar Jun 14 '22 09:06 floryut

I think it's still worthwhile testing macvlan if we can get it to work but on more recent os like debian 11. WDYT?

Indeed, maybe move the job to debian11 to see what's happening (not hopeful but well 😆)

From my investigation into the macvlan job, shared with @oomichi on slack just restarting containerd at the end of the run results in a working cluster so the issue is a race condition between when containerd is set up (first) and when the mtacvlan CNI is configured (second). We would need a generic way of restarting the container manager when the macvlan CNI is configured (I suspect this issue affects docker and cri-o as well, thus why "generic").

cristicalin avatar Jun 14 '22 12:06 cristicalin

I think it's still worthwhile testing macvlan if we can get it to work but on more recent os like debian 11. WDYT?

Indeed, maybe move the job to debian11 to see what's happening (not hopeful but well 😆)

From my investigation into the macvlan job, shared with @oomichi on slack just restarting containerd at the end of the run results in a working cluster so the issue is a race condition between when containerd is set up (first) and when the mtacvlan CNI is configured (second). We would need a generic way of restarting the container manager when the macvlan CNI is configured (I suspect this issue affects docker and cri-o as well, thus why "generic").

handlers:
  - include: container-engine/cri-o/handlers/main.yml
  - include: container-engine/docker/handlers/main.yml
  - include: container-engine/containerd/handlers/main.yml

- name: restart cri-o
  command: /bin/true
  notify: restart crio
  when: container_manager == 'crio'

- name: restart containerd
  command: /bin/true
  notify: restart containerd
  when: container_manager == 'containerd'

- name: restart docker
  command: /bin/true
  notify: restart docker
  when: container_manager == 'docker'

😆 not sure it's the more elegant way though

floryut avatar Jun 14 '22 12:06 floryut

I think it's still worthwhile testing macvlan if we can get it to work but on more recent os like debian 11. WDYT?

Indeed, maybe move the job to debian11 to see what's happening (not hopeful but well 😆)

Oh, that is a nice point. OK, let me try to convert the job to debian11 with macvlan. Let's see the test result.

oomichi avatar Jun 14 '22 20:06 oomichi

Let's wake the bot.

/ok-to-test

cristicalin avatar Jun 14 '22 22:06 cristicalin

@oomichi you will need to add the handlers suggested by @floryut for the job to work, even with the rebase to debian 11 the race condition is the same

cristicalin avatar Jun 15 '22 05:06 cristicalin

@oomichi you will need to add the handlers suggested by @floryut for the job to work, even with the rebase to debian 11 the race condition is the same

I missed that. OK, let me try it.

/hold

oomichi avatar Jun 15 '22 15:06 oomichi

Warning  NetworkNotReady  38s (x151 over 5m38s)   kubelet            network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized

oomichi avatar Sep 20 '22 00:09 oomichi

Try adding restarting-containerd

oomichi avatar Sep 20 '22 00:09 oomichi

Even if restarting containerd,

"  Type     Reason                  Age                   From               Message", "  ----     ------                  ----                  ----               -------", 
"  Normal   Scheduled               6m22s                 default-scheduler  Successfully assigned default/netchecker-agent-dhdt7 to instance-2", 
"  Warning  FailedCreatePodSandBox  6m22s                 kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox \"ba37310f2aa7f4311faf656bcce70f2851605cfd0481c927c98a3cd13478036a\": plugin type=\"macvlan\" name=\"mynet\" failed (add): Link not found", 
"  Warning  FailedCreatePodSandBox  6m9s                  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox \"dbd73beb158c2f715508f4f9c3ad2b943ae0e002f984bf17e9717ce2a7979180\": plugin type=\"macvlan\" name=\"mynet\" failed (add): Link not found", 
"  Warning  FailedCreatePodSandBox  5m57s                 kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox \"e1881466321e89e65f426cba67e59b47d8581b773813e57053fb722be3448f93\": plugin type=\"macvlan\" name=\"mynet\" failed (add): Link not found", 
"  Warning  FailedCreatePodSandBox  5m42s                 kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox \"f8ef25003c34a6a90605cdb674be1039e7ffa70a6db9f6a342a4bc40dd3f2fd1\": plugin type=\"macvlan\" name=\"mynet\" failed (add): Link not found", 
"  Warning  FailedMount             5m37s                 kubelet            MountVolume.SetUp failed for volume \"kube-api-access-2fzz4\" : failed to fetch token: Post \"https://localhost:6443/api/v1/namespaces/default/serviceaccounts/netchecker-agent/token\": EOF", 
"  Warning  FailedMount             5m36s                 kubelet            MountVolume.SetUp failed for volume \"kube-api-access-2fzz4\" : failed to fetch token: Post \"https://localhost:6443/api/v1/namespaces/default/serviceaccounts/netchecker-agent/token\": read tcp 127.0.0.1:34834->127.0.0.1:6443: read: connection reset by peer", 
"  Warning  FailedMount             5m31s                 kubelet            MountVolume.SetUp failed for volume \"kube-api-access-2fzz4\" : failed to fetch token: serviceaccounts \"netchecker-agent\" is forbidden: User \"system:node:instance-2\" cannot create resource \"serviceaccounts/token\" in API group \"\" in the namespace \"default\": no relationship found between node 'instance-2' and this object", 
"  Warning  FailedCreatePodSandBox  5m29s                 kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox \"8b551a4fc07662e607fa85e9c2be9fc20ae12b9ad11b9aeb3071847ffc70c5f8\": plugin type=\"macvlan\" name=\"mynet\" failed (add): Link not found", 
"  Warning  FailedCreatePodSandBox  5m17s                 kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox \"5d958c5ba58f0ee5cf06d35de8496fb82897c9f2fa2882f01ab8488a8d052f2f\": plugin type=\"macvlan\" name=\"mynet\" failed (add): Link not found", 
"  Warning  FailedCreatePodSandBox  5m5s                  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox \"0c54bcc5e1e7f2027bae719b6e8b246d7263ccf8cbfd6a14592205785309fb14\": plugin type=\"macvlan\" name=\"mynet\" failed (add): Link not found", 
"  Warning  FailedCreatePodSandBox  4m54s                 kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox \"27fc30256a1fc42f196d7be289e13e8a4437ee00f053782f6a443175004f5f1f\": plugin type=\"macvlan\" name=\"mynet\" failed (add): Link not found", 
"  Warning  FailedCreatePodSandBox  4m41s                 kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox \"085cf5b33c54e815db2a2b4bbe523ddf0d46155af3660e55a435d90a2fc13609\": plugin type=\"macvlan\" name=\"mynet\" failed (add): Link not found", 
"  Warning  FailedCreatePodSandBox  4m27s                 kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox \"1744eb0b153322c836090fc6e88639507310da7e1a3b303852cd84e458e0c5d8\": plugin type=\"macvlan\" name=\"mynet\" failed (add): Link not found", 
"  Warning  FailedCreatePodSandBox  4m15s                 kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox \"111ca9acfe017527ba7939ac8f9b11e521150986e384bc8d3207041d3cab9e3b\": plugin type=\"macvlan\" name=\"mynet\" failed (add): Link not found", 
"  Warning  FailedCreatePodSandBox  4m2s                  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox \"33c8cae75ae5fbcf80469f5747e1dba3160d1433a0b94ec0119d797621b2c1b6\": plugin type=\"macvlan\" name=\"mynet\" failed (add): Link not found", 
"  Warning  FailedCreatePodSandBox  3m48s                 kubelet            Failed to create pod sandbox: rpc error: code = Unknown des
[2890](https://gitlab.com/kargo-ci/kubernetes-sigs-kubespray/-/jobs/3051982087#L2890)c = failed to setup network for sandbox \"59a51a447c178183abdab96493dd56d30604ae8f567d904aab3560eec2ceb573\": plugin type=\"macvlan\" name=\"mynet\" failed (add): Link not found", 
"  Warning  FailedCreatePodSandBox  18s (x16 over 3m34s)  kubelet            (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox \"f8e661abe843b6042285656bee66221da80841c603c6b6bbcd49a83943424bda\": plugin type=\"macvlan\" name=\"mynet\" failed (add): Link not found"]}

oomichi avatar Sep 20 '22 01:09 oomichi

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Dec 19 '22 01:12 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jan 18 '23 02:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-triage-robot avatar Feb 17 '23 02:02 k8s-triage-robot

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Feb 17 '23 02:02 k8s-ci-robot