external-dns icon indicating copy to clipboard operation
external-dns copied to clipboard

Add flag for setting the cache sync timeout

Open idgenchev opened this issue 2 years ago • 13 comments

What would you like to be added:

Flag for setting a custom cache sync timeout.

Why is this needed:

We have a few clusters with a lot of pods, deployments and/or CRs and the default timeout of 60 seconds for the local cache sync is not enough, resulting in the external-dns pod ending up in a crash looping state.

Example logs:

time="2022-09-06T12:07:37Z" level=fatal msg="failed to sync *v1.Service: context deadline exceeded"

or

time="2022-09-06T12:07:37Z" level=fatal msg="failed to sync *v1.Pod: context deadline exceeded"

idgenchev avatar Sep 06 '22 12:09 idgenchev

We are experiencing the same thing on several of our clusters. +1 to this option.

skizot722 avatar Oct 03 '22 21:10 skizot722

I would likely say it's a miss configuration in your cluster, because of, but I also don't mind the change if it would be done by using context properly instead of passing around an additional parameter through all the things. Miss configuration because of:

time="2022-09-06T12:07:37Z" level=fatal msg="failed to sync *v1.Pod: context deadline exceeded"

This looks like naming pods, which should not be used in production clusters and hopefully nobody relies on it in production. It's great to have in some raspberrypi cases but other than that I would suggest to omit it.

szuecs avatar Oct 05 '22 19:10 szuecs

This looks like naming pods, which should not be used in production clusters and hopefully nobody relies on it in production. It's great to have in some raspberrypi cases but other than that I would suggest to omit it.

It's great that you mention this, because my source configuration looks like this:

      --source=service
      --source=istio-gateway
      --source=istio-virtualservice
      --source=ingress

pod is not listed as a source. Why would external-dns be creating a watch on that resource?

skizot722 avatar Oct 05 '22 19:10 skizot722

I would blame istio for all problems. I don't check it but ingress and service is safe.

szuecs avatar Oct 06 '22 11:10 szuecs

I would blame istio for all problems. I don't check it but ingress and service is safe.

We're only listing ingress and service as sources and still seeing the exact same issue.

idgenchev avatar Oct 07 '22 11:10 idgenchev

Interesting then I have to dig into it more.

szuecs avatar Oct 08 '22 17:10 szuecs

Services source looks to list pods in a couple situations:

  1. Node port targets with local traffic policy
  2. Headless endpoints

https://github.com/kubernetes-sigs/external-dns/blob/master/source/service.go

tstraley avatar Nov 02 '22 21:11 tstraley

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 31 '23 21:01 k8s-triage-robot

/remove-lifecycle stale

JRemitz avatar Feb 22 '23 06:02 JRemitz

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar May 23 '23 06:05 k8s-triage-robot

We use https://github.com/kubernetes/client-go/blob/master/informers/factory.go#L115 and options are defined in https://github.com/kubernetes/client-go/blob/master/informers/factory.go#L54

https://github.com/kubernetes-sigs/controller-runtime/pull/1247/files was done but it does expose it only to controller.Controller.

I don't see a possibility to increase a cache sync timeout.

szuecs avatar May 26 '23 14:05 szuecs

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 21 '24 09:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Feb 20 '24 09:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Mar 21 '24 10:03 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Mar 21 '24 10:03 k8s-ci-robot