apiserver-network-proxy icon indicating copy to clipboard operation
apiserver-network-proxy copied to clipboard

proxy-server container gets OOM killed repeatedly preventing exec/log commands from working

Open PratikDeoghare opened this issue 2 years ago • 2 comments

  1. proxy-server container gets OOM killed repeatedly preventing exec/log commands from working.

$ k logs konnectivity-agent-8d795bf89-2dvgs -f
Error from server: Get "[https://10.205.194.144:10250/containerLogs/kube-system/konnectivity-agent-8d795bf89-2dvgs/konnectivity-agent?follow=true":](https://10.205.194.144:10250/containerLogs/kube-system/konnectivity-agent-8d795bf89-2dvgs/konnectivity-agent?follow=true%22:) dial timeout, backstop

$ k exec -it cilium-l2cqb -- sh
Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), clean-cilium-state (init)
Error from server: error dialing backend: dial timeout, backstop
 ⎈ 8fvc5rxs4d/nmd-argos-test01  kube-system  ~ 
$ k exec -it cilium-l2cqb -- sh
Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), clean-cilium-state (init)
Error from server: error dialing backend: dial timeout, backstop
 ⎈ 8fvc5rxs4d/nmd-argos-test01  kube-system  ~ 
$  k exec -it cilium-l2cqb -- sh
Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), clean-cilium-state (init)
#

  1. This "impossible" message also appears in the logs many times.
3393:I0705 09:38:01.319923       1 backend_manager.go:234] "This should not happen. Removed connection that is not the first connection" connection=&{ServerStream:0xc000d02000} remainingConnections=[0xc005baa738]

Versions:

Kubernetes version: v1.22.9
Konnectivity versions: v0.0.26, v0.0.32

Related issues: https://github.com/kubernetes-sigs/apiserver-network-proxy/issues/276.

Reproduction: Our customer was able to reproduce it using https://github.com/kubernetes-sigs/apiserver-network-proxy/issues/276#issuecomment-967297381.

Attachments: Logs and Grafana screenshots. OOMLogs.zip

PratikDeoghare avatar Jul 12 '22 08:07 PratikDeoghare

@PratikDeoghare

A lot of memory issues were fixed in releases to the konnectivity-client vendored in Kubernetes. v1.22.9 should have those fixes.

ipochi avatar Jul 12 '22 14:07 ipochi

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Oct 10 '22 15:10 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Nov 09 '22 16:11 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Dec 09 '22 16:12 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Dec 09 '22 16:12 k8s-ci-robot