controller-runtime icon indicating copy to clipboard operation
controller-runtime copied to clipboard

Properly exit when cert-watcher can no longer watch required file

Open ethernoy opened this issue 4 years ago • 12 comments

When the TLS assets are no longer available (missing drive for example) after a webhook server is started with such TLS assets, the cert-watcher throws the following error: https://github.com/kubernetes-sigs/controller-runtime/blob/4d10a0615b11507451ecb58bfd59f0f6ef313a29/pkg/certwatcher/certwatcher.go#L144

{"level":"error","ts":1636596243.6302137,"logger":"controller-runtime.certwatcher","msg":"error re-watching file","error":"no such file or directory","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/open-policy-agent/gatekeeper/vendor/github.com/go-logr/zapr/zapr.go:132\nsigs.k8s.io/controller-runtime/pkg/log.(*DelegatingLogger).Error\n\t/go/src/github.com/open-policy-agent/gatekeeper/vendor/sigs.k8s.io/controller-runtime/pkg/log/deleg.go:144\nsigs.k8s.io/controller-runtime/pkg/webhook/internal/certwatcher.(*CertWatcher).handleEvent\n\t/go/src/github.com/open-policy-agent/gatekeeper/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/internal/certwatcher/certwatcher.go:144\nsigs.k8s.io/controller-runtime/pkg/webhook/internal/certwatcher.(*CertWatcher).Watch\n\t/go/src/github.com/open-policy-agent/gatekeeper/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/internal/certwatcher/certwatcher.go:102"}

After throwing the following error, the cert-watcher simply stops monitoring the path without further action. The last valid certificate persists in currentCert even after the path becomes available again. I wonder if it is better if the cert-watcher can either:

  • call os.exit after the path missing error occurs
  • keeps monitoring the path even if it is missing

Happy to do a PR if possible.

ethernoy avatar Nov 16 '21 10:11 ethernoy

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Feb 14 '22 11:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Mar 16 '22 12:03 k8s-triage-robot

/remove-lifecycle rotten

ethernoy avatar Mar 20 '22 10:03 ethernoy

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jun 18 '22 10:06 k8s-triage-robot

This is one blog explaining k8s fsnotfy, looks cert-watcher can not handle secret mounted in container.

hzxuzhonghu avatar Jul 07 '22 09:07 hzxuzhonghu

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Aug 06 '22 10:08 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-triage-robot avatar Sep 05 '22 10:09 k8s-triage-robot

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Sep 05 '22 10:09 k8s-ci-robot

/reopen

ichekrygin avatar Apr 01 '24 22:04 ichekrygin

@ichekrygin: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Apr 01 '24 22:04 k8s-ci-robot

To be honest, I would be really surprised if the certwatcher "can not handle secret mounted in container.". As far as I'm aware that is the standard case with controller-runtime and folks have running this in production for a very long time (also in combination with certificates managed by cert-manager that are rotated every few weeks).

Maybe I'm missing something. Definitely fine to investigate further and if there is evidence that we have a problem fix it.

/remove-lifecycle rotten /reopen

sbueringer avatar Apr 08 '24 18:04 sbueringer

@sbueringer: Reopened this issue.

In response to this:

To be honest, I would be really surprised if the certwatcher "can not handle secret mounted in container.". As far as I'm aware that is the standard case with controller-runtime and folks have running this in production for a very long time (also in combination with certificates managed by cert-manager that are rotated every few weeks).

Maybe I'm missing something. Definitely fine to investigate further and if there is evidence that we have a problem fix it.

/remove-lifecycle rotten /reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Apr 08 '24 18:04 k8s-ci-robot