Installing knative-serving yamls second time without deleting the knative-serving namespace doesn't populate the webhook certs.
What version of Knative?
1.4.0
Expected Behavior
This is what I am doing and happening right now:
- Create a new NS
- kubectl apply -f on {serving-crds.yaml, serving-core.yaml, net-istio.yaml}
- Everything works fine
- Now, do kubectl delete -f on serving-crds.yaml + serving-core.yaml + net-istio.yaml
- Once everything is cleaned up, i do `kubectl apply -f again in these 3 yamls
- Everything should work fine
Actual Behavior
- Create a new NS
- kubectl apply -f on {serving-crds.yaml, serving-core.yaml, net-istio.yaml}
- Everything works fine
- Now, do kuebctl delete -f on serving-crds.yaml + serving-core.yaml + net-istio.yaml
- Once everything is cleaned up, i do `kubectl apply -f again in these 3 yamls
- But this time, webhooks won’t run as their certs are not populated
Without deleting the namespace completely, re-apply of knative-serving second time would fail as webhooks certs won't be populated.
Steps to Reproduce the Problem
- Create a new NS
- kubectl apply -f on {serving-crds.yaml, serving-core.yaml, net-istio.yaml}
- Everything works fine
- Now, do kuebctl delete -f on serving-crds.yaml + serving-core.yaml + net-istio.yaml
- Once everything is cleaned up, i do `kubectl apply -f again in these 3 yamls
- But this time, webhooks won’t run as their certs are not populated
Is there a reason why you're unable to delete the namespace?
@psschwei : At our org, k8s is managed service by a central team. Although, it's possible to delete that and re-create but it does make the whole process tedious as I have to move out of kustomize framework to do so to use their own cli/tools.
Just to add a little more detail here: when the serving-core.yaml file is applied, it initially creates an empty secret with for the webhook certs, then as part of its reconciliation loop the certs are populated into the secret once the leaderelection lease is acquired.
In the situation described in this issue (installing, deleting everything but the namespace, and then reinstalling), it looks like the lender lease is never acquired, and as a result the certs never get populated to the secret, and thus the failures being seen.
would need to dig into it a bit more to determine if leader election failing in this scenario is expected or a bug...
@psschwei : Can this issue be triaged for next release? Or do we know if this is the expected behavior?
We just ran into what is probably the same issue on Serving 1.3.2 and Operator 1.5.3 hosted in Azure (AKS). We had to perform a cluster certificate rotation. Afterwards all of the Knative Serving pods were in a CrashLoopBackoff due to invalid certificates. We tried deleting all -certs secrets. They were recreated but with metadata only. We waited for >5 mintues which should be long enough for any leader election related issue. Deleting the namespace was the only workaround that we could find.
This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.
This issue or pull request is stale because it has been open for 90 days with no activity.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten - Close this issue or PR with
/close
/lifecycle stale
" it looks like the lender lease is never acquired" i also found this,and should do this kubectl get lease -n knative-serving |grep webhook | awk '{print $1}' |xargs kubectl delete lease -n knative-serving
it looks like the lease can be acquired again, but why it happen @psschwei
/reopen