external-dns
external-dns copied to clipboard
Add flag for setting the cache sync timeout
What would you like to be added:
Flag for setting a custom cache sync timeout.
Why is this needed:
We have a few clusters with a lot of pods, deployments and/or CRs and the default timeout of 60 seconds for the local cache sync is not enough, resulting in the external-dns pod ending up in a crash looping state.
Example logs:
time="2022-09-06T12:07:37Z" level=fatal msg="failed to sync *v1.Service: context deadline exceeded"
or
time="2022-09-06T12:07:37Z" level=fatal msg="failed to sync *v1.Pod: context deadline exceeded"
We are experiencing the same thing on several of our clusters. +1 to this option.
I would likely say it's a miss configuration in your cluster, because of, but I also don't mind the change if it would be done by using context properly instead of passing around an additional parameter through all the things. Miss configuration because of:
time="2022-09-06T12:07:37Z" level=fatal msg="failed to sync *v1.Pod: context deadline exceeded"
This looks like naming pods, which should not be used in production clusters and hopefully nobody relies on it in production. It's great to have in some raspberrypi cases but other than that I would suggest to omit it.
This looks like naming pods, which should not be used in production clusters and hopefully nobody relies on it in production. It's great to have in some raspberrypi cases but other than that I would suggest to omit it.
It's great that you mention this, because my source configuration looks like this:
--source=service
--source=istio-gateway
--source=istio-virtualservice
--source=ingress
pod
is not listed as a source. Why would external-dns be creating a watch on that resource?
I would blame istio for all problems. I don't check it but ingress and service is safe.
I would blame istio for all problems. I don't check it but ingress and service is safe.
We're only listing ingress and service as sources and still seeing the exact same issue.
Interesting then I have to dig into it more.
Services source looks to list pods in a couple situations:
- Node port targets with local traffic policy
- Headless endpoints
https://github.com/kubernetes-sigs/external-dns/blob/master/source/service.go
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
We use https://github.com/kubernetes/client-go/blob/master/informers/factory.go#L115 and options are defined in https://github.com/kubernetes/client-go/blob/master/informers/factory.go#L54
https://github.com/kubernetes-sigs/controller-runtime/pull/1247/files was done but it does expose it only to controller.Controller.
I don't see a possibility to increase a cache sync timeout.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.