external-dns
external-dns copied to clipboard
Cloudflare 5XX responses crash pod
What happened: Cloudflare temporarily returned 5XXs for zone lookup calls, which caused the pods to crash.
What you expected to happen: Log an error without crashing and try again later.
How to reproduce it (as minimally and precisely as possible): Likely can't without mocking the API responses from Cloudflare.
Anything else we need to know?:
Here are some relevant logs
Environment:
- External-DNS version (use
external-dns --version): 0.14.2 - DNS provider: cloudflare
- Others:
It is also crashing on rate limits for zone lookups.
zone fd823e13e494d5430aea9dfd4311a6e1 lookup failed, exceeded available rate limit retries
Failed to do run once: exceeded available rate limit retries
Experiencing these crash as well
/help
When not related example solution https://github.com/kubernetes-sigs/external-dns/pull/4573 for another provider
Ideally this will required a global rate limiter
@ivankatliarchuk: This request has been marked as needing help from a contributor.
Guidelines
Please ensure that the issue body includes answers to the following questions:
- Why are we solving this issue?
- To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
- Does this issue have zero to low barrier of entry?
- How can the assignee reach out to you for help?
For more details on the requirements of such an issue, please see here and ensure that they are met.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.
In response to this:
/help
When not related example solution https://github.com/kubernetes-sigs/external-dns/pull/4573 for another provider
Ideally this will required a global rate limiter
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
/assign
Relates https://github.com/kubernetes-sigs/external-dns/issues/5225
Hi,
I am also experiencing this issue on cloudflare provider with rate limiting, having the following fatal error message Failed to do run once: exceeded available rate limit retries, leading to a crash of the pod.
I have read #5225 about soft errors, and was surprised that this was not handled as a soft error, since there is some line of code checking explicitely for this
https://github.com/kubernetes-sigs/external-dns/blob/017f7687ca393138b565e51757a6fa8010902066/provider/cloudflare/cloudflare.go#L278
This function internal to https://github.com/cloudflare/cloudflare-go checks that the error type is ErrorTypeRateLimit ErrorType = "rate_limit"
However the error I have is generated by this line
https://github.com/cloudflare/cloudflare-go/blob/57714bfbdeea095ec27f4b9bafe65bda4f178d96/cloudflare.go#L266
and returned earlier that the use of theses static error types
https://github.com/cloudflare/cloudflare-go/blob/57714bfbdeea095ec27f4b9bafe65bda4f178d96/cloudflare.go#L286
I think that the problem is upstream at https://github.com/cloudflare/cloudflare-go, this kind of error should be seen as a rate limit, not a generic one.
I will open an issue on their side, but while waiting for this I don't know if we can have a solution in this repo, except maybe accept more errors as "soft errors" our side.
I have submitted a PR to fix this issue for rate limiting errors : https://github.com/kubernetes-sigs/external-dns/pull/5524