controller-runtime icon indicating copy to clipboard operation
controller-runtime copied to clipboard

Unsafe assumptions and leaked goroutine in logging

Open howardjohn opened this issue 4 years ago • 8 comments

This code is problematic: https://github.com/kubernetes-sigs/controller-runtime/blob/498ee8ae7cc1dfc414f453f98cde3114116e5442/pkg/log/log.go#L53-L70

First, it leaks a goroutine for 30s just by importing the library. Generally its assumed an import is not creating goroutines. The real world impact of this is that various projects (Istio, gRPC, etc) have tests that assert no goroutines are leaking; these fail if controller-runtime is so much as imported.

Another issue is It is safe to assume that if this wasn't set within the first 30 seconds of a binaries. This seems not safe to assert. There is certainly no reason that I cannot set a logger 31s or 30h into the process lifetime. The controller-runtime library can and is used as a small piece of larger binaries; its not always the sole thing running in the process.

howardjohn avatar Sep 03 '21 23:09 howardjohn

The controller-runtime library can and is used as a small piece of larger binaries; its not always the sole thing running in the process

While we want to improve the use cases around this, especially because the controller-runtime client is so much nicer to work with than client-go for many use cases, we do still largely optimize for writing operators with tools like Kubebuilder and Operator-SDK where controller-runtime is indeed in overarching control of the universe. This code was added to fix some substantial memory leak issues due to dangling loggers. If you have suggestions on how to improve this without hurting the primary use cases, please do let us know.

coderanger avatar Sep 04 '21 00:09 coderanger

Hey @howardjohn ,

The reason this logic exists is that the DelegatingLogger ends up consuming a huge amount of memory if no Logger gets set: https://github.com/kubernetes-sigs/controller-runtime/issues/1122

The current logic is the best we could come up with to deal with that situation and "leaking" a goroutine for 30 seconds/not allowing to set a logger after that is definitely less of an issue than the memory consumption.

If you have any ideas on how to improve things here, they are very welcome.

alvaroaleman avatar Sep 05 '21 22:09 alvaroaleman

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Dec 04 '21 23:12 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jan 03 '22 23:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-triage-robot avatar Feb 02 '22 23:02 k8s-triage-robot

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Feb 02 '22 23:02 k8s-ci-robot

I just found this weird goroutine in a debugging session.

If you have any ideas on how to improve things here, they are very welcome.

Why not use klog, like Kubernetes? That would be consistent and not a surprise at all. Most apps that care about logs and use Kubernetes libraries set klog logger via klog.SetOutput().

/reopen /remove-lifecycle rotten

ash2k avatar Aug 04 '22 11:08 ash2k

@ash2k: Reopened this issue.

In response to this:

I just found this weird goroutine in a debugging session.

If you have any ideas on how to improve things here, they are very welcome.

Why not use klog, like Kubernetes? That would be consistent and not a surprise at all. Most apps that care about logs and use Kubernetes libraries set klog logger via klog.SetOutput().

/reopen /remove-lifecycle rotten

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Aug 04 '22 11:08 k8s-ci-robot

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Nov 02 '22 11:11 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Dec 02 '22 11:12 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Jan 01 '23 12:01 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jan 01 '23 12:01 k8s-ci-robot

/reopen /remove-lifecycle rotten

ash2k avatar Feb 28 '23 09:02 ash2k

@ash2k: Reopened this issue.

In response to this:

/reopen /remove-lifecycle rotten

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Feb 28 '23 09:02 k8s-ci-robot