serving icon indicating copy to clipboard operation
serving copied to clipboard

Installing knative-serving yamls second time without deleting the knative-serving namespace doesn't populate the webhook certs.

Open rachitchauhan43 opened this issue 3 years ago • 5 comments

What version of Knative?

1.4.0

Expected Behavior

This is what I am doing and happening right now:

  1. Create a new NS
  2. kubectl apply -f on {serving-crds.yaml, serving-core.yaml, net-istio.yaml}
  3. Everything works fine
  4. Now, do kubectl delete -f on serving-crds.yaml + serving-core.yaml + net-istio.yaml
  5. Once everything is cleaned up, i do `kubectl apply -f again in these 3 yamls
  6. Everything should work fine

Actual Behavior

  1. Create a new NS
  2. kubectl apply -f on {serving-crds.yaml, serving-core.yaml, net-istio.yaml}
  3. Everything works fine
  4. Now, do kuebctl delete -f on serving-crds.yaml + serving-core.yaml + net-istio.yaml
  5. Once everything is cleaned up, i do `kubectl apply -f again in these 3 yamls
  6. But this time, webhooks won’t run as their certs are not populated

Without deleting the namespace completely, re-apply of knative-serving second time would fail as webhooks certs won't be populated.

Steps to Reproduce the Problem

  1. Create a new NS
  2. kubectl apply -f on {serving-crds.yaml, serving-core.yaml, net-istio.yaml}
  3. Everything works fine
  4. Now, do kuebctl delete -f on serving-crds.yaml + serving-core.yaml + net-istio.yaml
  5. Once everything is cleaned up, i do `kubectl apply -f again in these 3 yamls
  6. But this time, webhooks won’t run as their certs are not populated

rachitchauhan43 avatar May 31 '22 18:05 rachitchauhan43

Is there a reason why you're unable to delete the namespace?

psschwei avatar Jun 02 '22 21:06 psschwei

@psschwei : At our org, k8s is managed service by a central team. Although, it's possible to delete that and re-create but it does make the whole process tedious as I have to move out of kustomize framework to do so to use their own cli/tools.

rachitchauhan43 avatar Jun 03 '22 00:06 rachitchauhan43

Just to add a little more detail here: when the serving-core.yaml file is applied, it initially creates an empty secret with for the webhook certs, then as part of its reconciliation loop the certs are populated into the secret once the leaderelection lease is acquired.

In the situation described in this issue (installing, deleting everything but the namespace, and then reinstalling), it looks like the lender lease is never acquired, and as a result the certs never get populated to the secret, and thus the failures being seen.

would need to dig into it a bit more to determine if leader election failing in this scenario is expected or a bug...

psschwei avatar Jun 03 '22 21:06 psschwei

@psschwei : Can this issue be triaged for next release? Or do we know if this is the expected behavior?

rachitchauhan43 avatar Jun 22 '22 23:06 rachitchauhan43

We just ran into what is probably the same issue on Serving 1.3.2 and Operator 1.5.3 hosted in Azure (AKS). We had to perform a cluster certificate rotation. Afterwards all of the Knative Serving pods were in a CrashLoopBackoff due to invalid certificates. We tried deleting all -certs secrets. They were recreated but with metadata only. We waited for >5 mintues which should be long enough for any leader election related issue. Deleting the namespace was the only workaround that we could find.

tshak avatar Aug 24 '22 07:08 tshak

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

github-actions[bot] avatar Nov 23 '22 01:11 github-actions[bot]

This issue or pull request is stale because it has been open for 90 days with no activity.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close

/lifecycle stale

knative-prow-robot avatar Dec 23 '22 02:12 knative-prow-robot

" it looks like the lender lease is never acquired" i also found this,and should do this kubectl get lease -n knative-serving |grep webhook | awk '{print $1}' |xargs kubectl delete lease -n knative-serving

it looks like the lease can be acquired again, but why it happen @psschwei

antiClocke avatar Dec 28 '22 05:12 antiClocke

/reopen

antiClocke avatar Dec 28 '22 05:12 antiClocke