test-infra icon indicating copy to clipboard operation
test-infra copied to clipboard

Containerd E2E test fails with failed to load cni during init

Open ipochi opened this issue 2 years ago • 13 comments

What happened:

Two unrelated docker based e2e tests were moved to containerd in separate PRs (https://github.com/kubernetes/test-infra/pull/25254, https://github.com/kubernetes/test-infra/pull/23243).

Job links :

https://k8s-testgrid.appspot.com/sig-node-kubelet#kubelet-gce-e2e-lock-contention https://k8s-testgrid.appspot.com/sig-node-containerd#node-kubelet-containerd-performance-test

Both the tests are continously failing with the same error mesage:

log snippet from containerd

Feb 22 14:31:42 tmp-node-e2e-1026ef4c-cos-89-16108-604-11 containerd[339]: time="2022-02-22T14:31:42.088924479Z"
level=error msg="failed to load cni during init, please check CRI plugin status before setting up network for pods" 
error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"
..
..
..
Feb 22 14:38:41 tmp-node-e2e-1026ef4c-cos-89-16108-604-11 containerd[339]: time="2022-02-22T14:38:41.086102911Z" 
level=info msg="No cni config template is specified, wait for other system components to drop the config."

kubelet.log also has the same messages regarding cni plugin not ready

I0222 14:38:54.019413    2109 kubelet.go:2323] "Container runtime status" status="Runtime Conditions: RuntimeReady=true reason: message:, NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
E0222 14:38:54.019441    2109 kubelet.go:2326] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"

What you expected to happen:

The test to run successfully.

/cc: @Namanl2001 @adisky @SergeyKanzhelev

ipochi avatar Feb 22 '22 19:02 ipochi

/sig node /triage accepted

SergeyKanzhelev avatar Feb 22 '22 19:02 SergeyKanzhelev

what's the diff in test definition with one of the working jobs?

SergeyKanzhelev avatar Feb 22 '22 19:02 SergeyKanzhelev

what's the diff in test definition with one of the working jobs?

What do you mean by test definition ?

ipochi avatar Feb 22 '22 19:02 ipochi

one of the tests mentioned here doesn't have

--container-runtime-endpoint=unix:///run/containerd/containerd.sock 
--container-runtime-process-name=/usr/bin/containerd 
--container-runtime-pid-file=

but the other one does yet it fails and it matches the one of the successful tests.

Same is with the name of test: one doesn't contain the prefix -containerd-, but other does.

ipochi avatar Feb 22 '22 20:02 ipochi

@ipochi the CNI init error is resolved now, but tests are still failing and now both the tests have different errors tracking node performance test here https://github.com/kubernetes/test-infra/issues/25430 tracking lock contention tests here https://github.com/kubernetes/kubernetes/issues/108348

adisky avatar Feb 25 '22 07:02 adisky

@ipochi the CNI init error is resolved now, but tests are still failing and now both the tests have different errors tracking node performance test here https://github.com/kubernetes/test-infra/issues/25430 tracking lock contention tests here https://github.com/kubernetes/kubernetes/issues/108348

Hi @adisky

Looking into it.

ipochi avatar Feb 25 '22 07:02 ipochi

https://github.com/kubernetes/kubernetes/issues/108348#issuecomment-1058933790

ipochi avatar Mar 04 '22 09:03 ipochi

https://github.com/kubernetes/test-infra/pull/25509

ipochi avatar Mar 04 '22 10:03 ipochi

/triage accepted /priority important-soon

SergeyKanzhelev avatar Mar 16 '22 17:03 SergeyKanzhelev

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jun 14 '22 19:06 k8s-triage-robot

kubelet-gce-e2e-lock-contention is green now. node-kubelet-containerd-performance-test failed but not for cni not ready

Not sure if it is related with https://github.com/kubernetes/test-infra/issues/25430

pacoxu avatar Jun 24 '22 10:06 pacoxu

/remove-lifecycle stale

pacoxu avatar Jun 24 '22 10:06 pacoxu

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Sep 22 '22 10:09 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Oct 22 '22 11:10 k8s-triage-robot

The issue has been marked as an important bug and triaged. Such issues are automatically marked as frozen when hitting the rotten state to avoid missing important bugs.

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle frozen

k8s-triage-robot avatar Oct 22 '22 14:10 k8s-triage-robot

This issue is labeled with priority/important-soon but has not been updated in over 90 days, and should be re-triaged. Important-soon issues must be staffed and worked on either currently, or very soon, ideally in time for the next release.

You can:

  • Confirm that this issue is still relevant with /triage accepted (org members only)
  • Deprioritize it with /priority important-longterm or /priority backlog
  • Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

k8s-triage-robot avatar Feb 08 '23 02:02 k8s-triage-robot

/close

SergeyKanzhelev avatar Mar 01 '23 18:03 SergeyKanzhelev

@SergeyKanzhelev: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Mar 01 '23 18:03 k8s-ci-robot