external-dns Add flag for setting the cache sync timeout

What would you like to be added:

Flag for setting a custom cache sync timeout.

Why is this needed:

We have a few clusters with a lot of pods, deployments and/or CRs and the default timeout of 60 seconds for the local cache sync is not enough, resulting in the external-dns pod ending up in a crash looping state.

Example logs:

time="2022-09-06T12:07:37Z" level=fatal msg="failed to sync *v1.Service: context deadline exceeded"

or

time="2022-09-06T12:07:37Z" level=fatal msg="failed to sync *v1.Pod: context deadline exceeded"

Sep 06 '22 12:09 idgenchev

We are experiencing the same thing on several of our clusters. +1 to this option.

Oct 03 '22 21:10 skizot722

I would likely say it's a miss configuration in your cluster, because of, but I also don't mind the change if it would be done by using context properly instead of passing around an additional parameter through all the things. Miss configuration because of:

time="2022-09-06T12:07:37Z" level=fatal msg="failed to sync *v1.Pod: context deadline exceeded"

This looks like naming pods, which should not be used in production clusters and hopefully nobody relies on it in production. It's great to have in some raspberrypi cases but other than that I would suggest to omit it.

Oct 05 '22 19:10 szuecs

This looks like naming pods, which should not be used in production clusters and hopefully nobody relies on it in production. It's great to have in some raspberrypi cases but other than that I would suggest to omit it.

It's great that you mention this, because my source configuration looks like this:

      --source=service
      --source=istio-gateway
      --source=istio-virtualservice
      --source=ingress

pod is not listed as a source. Why would external-dns be creating a watch on that resource?

Oct 05 '22 19:10 skizot722

I would blame istio for all problems. I don't check it but ingress and service is safe.

Oct 06 '22 11:10 szuecs

I would blame istio for all problems. I don't check it but ingress and service is safe.

We're only listing ingress and service as sources and still seeing the exact same issue.

Oct 07 '22 11:10 idgenchev

Interesting then I have to dig into it more.

Oct 08 '22 17:10 szuecs

Services source looks to list pods in a couple situations:

Node port targets with local traffic policy
Headless endpoints

https://github.com/kubernetes-sigs/external-dns/blob/master/source/service.go

Nov 02 '22 21:11 tstraley

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jan 31 '23 21:01 k8s-triage-robot

/remove-lifecycle stale

Feb 22 '23 06:02 JRemitz

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

May 23 '23 06:05 k8s-triage-robot

We use https://github.com/kubernetes/client-go/blob/master/informers/factory.go#L115 and options are defined in https://github.com/kubernetes/client-go/blob/master/informers/factory.go#L54

https://github.com/kubernetes-sigs/controller-runtime/pull/1247/files was done but it does expose it only to controller.Controller.

I don't see a possibility to increase a cache sync timeout.

May 26 '23 14:05 szuecs

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jan 21 '24 09:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Feb 20 '24 09:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Mar 21 '24 10:03 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Mar 21 '24 10:03 k8s-ci-robot

external-dns external-dns copied to clipboard

Add flag for setting the cache sync timeout

external-dns
external-dns copied to clipboard