hcloud-cloud-controller-manager icon indicating copy to clipboard operation
hcloud-cloud-controller-manager copied to clipboard

Hcloud manager errors: Couldn't reconcile node routes error listing routes context deadline exceeded

Open mmpetarpeshev opened this issue 2 years ago • 16 comments

We are using hcloud manager in cluster deployed on Hetzner VMs. Hcloud manager is deployed with network support. After few days it started hit the hetzner cloud api limits and log the following errors :

E0830 16:20:57.819595 1 route_controller.go:118] Couldn't reconcile node routes: error listing routes: hcloud/ListRoutes: hcloud/hcloudRouteToRoute: hcops/AllServersCache.ByPrivateIP: 192.168.1.7 hcops/AllServersCache.getCache: Get "https://api.hetzner.cloud/v1/servers?page=1&per_page=50": context deadline exceeded

Hcloud manager is deployed with kubespray addons and without some specific configurations. It doesnt look to effect the cluster somehow for now , but from the logs it looks like an issue and effects our terraform commands even that they are using different api keys for hetzner cloud api.

mmpetarpeshev avatar Aug 30 '22 17:08 mmpetarpeshev

We have had the same issues recently and constantly hit the Hetzner cloud's rate limit probably due to retries.

Also, the document says the rate limit is per project, not per API key and the support team refused to increase rate limit :-(

ym avatar Sep 04 '22 11:09 ym

I’ve often run into the same issues – Hetzner API rate limits are too strict and to low. I see very often rate limit errors, too.

Sad that Hetzner is not willing to increase the rate limits as it cannot be used for serious setups, if it hits these API rate limits ...

Am 04.09.2022 um 13:52 schrieb Aveline @.***>:

We have had the same issues recently and constantly hit the Hetzner cloud's rate limit probably due to retries.

Also, the document says the rate limit is per project, not per API key and the support team refused to increase rate limit :-(

— Reply to this email directly, view it on GitHub https://github.com/hetznercloud/hcloud-cloud-controller-manager/issues/308#issuecomment-1236320523, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEMIAFJKQ6KJSLISZIX65TV4SEPBANCNFSM6AAAAAAQAT3MSE. You are receiving this because you are subscribed to this thread.

talex-de avatar Sep 04 '22 14:09 talex-de

@LKaemmerling Thanks linked the fix to the Issue .Do you know , when it will be released ?

mmpetarpeshev avatar Sep 06 '22 14:09 mmpetarpeshev

We'll release it this week, maybe tomorrow! I'll keep you up to date 👌🏼

4ND3R50N avatar Sep 06 '22 14:09 4ND3R50N

After the release , hcloud manager still hits the api limit :

E0918 04:11:36.921648 1 route_controller.go:119] Couldn't reconcile node routes: error listing routes: hcloud/ListRoutes: hcloud/hcloudRouteToRoute: hcops/AllServersCache.ByPrivateIP: hcops/AllServersCache.getCache: Get "https://api.hetzner.cloud/v1/servers?page=1&per_page=50": context deadline exceeded

Honestly , thats not serious , how to deploy your production workloads in Hetzner in that case?

mmpetarpeshev avatar Sep 18 '22 07:09 mmpetarpeshev

Please reopen , we cant use our terraform , because the hcloud manager always hits the API limit.

mmpetarpeshev avatar Sep 23 '22 20:09 mmpetarpeshev

Please reopen , we cant use our terraform , because the hcloud manager always hits the API limit.

@mmpetarpeshev Sorry for the late reply. We will ofc take care of this! Two questions here:

  1. Do you use the newest version? (v1.13.0)
  2. If yes, is the error still the same?

I just want to make sure that its only the API limits hitting you. The context deadline exceeded was kinda blurry error message. The newest version should print a more specific error (besides the deadline exceeded)

4ND3R50N avatar Sep 26 '22 06:09 4ND3R50N

Hi @4ND3R50N , thanks for taking care of that.

1.We are using docker image tag as I pull the image few days ago.Will check later today the version from the logs. 2.Error message was little bit different I think , something like :

route_controller.go:119] Couldn't reconcile node routes: error listing routes: hcloud/ListRoutes: hcloud/reloadNetwork: limit of 3600 requests per hour reached (rate_limit_exceeded)

Will check everything later today and will provide update

mmpetarpeshev avatar Sep 26 '22 08:09 mmpetarpeshev

@mmpetarpeshev

Ok, good news, so its the API limit. Dont worry, this is an internal mechanism to prevent spam. I will talk to some collegues how we gonna proceed with those cases since youre not the only one having trouble with it.

Waiting for your final update, i will also keep u up to date :-)

4ND3R50N avatar Sep 26 '22 08:09 4ND3R50N

I checked the logs and there is the line : Hetzner Cloud k8s cloud controller v1.9.1 started I tried with latest docker image tag and with v1.13.0. Tried with helm deployment and ansible (aka daemon set or deployment). Not sure is that the correct version as you said v1.13.0 , from what I saw the docker images latest tag is updated few days ago.

mmpetarpeshev avatar Sep 26 '22 18:09 mmpetarpeshev

@mmpetarpeshev

Ok, good news, so its the API limit. Dont worry, this is an internal mechanism to prevent spam. I will talk to some collegues how we gonna proceed with those cases since youre not the only one having trouble with it.

Waiting for your final update, i will also keep u up to date :-)

Hi, is there any ETA of this issue? We're still constantly hitting this issue even after upgrading to v1.13.1.


E0930 08:01:14.976446       1 route_controller.go:119] Couldn't reconcile node routes: error listing routes: hcloud/ListRoutes: hcloud/reloadNetwork: limit of 3600 requests per hour reached (rate_limit_exceeded)
E0930 08:01:15.832617       1 node_controller.go:364] Failed to update node addresses for node "us-east1-prd-worker-13": failed to get node address from cloud provider that matches ip: 10.241.0.28
E0930 08:01:15.932972       1 route_controller.go:119] Couldn't reconcile node routes: error listing routes: hcloud/ListRoutes: hcloud/reloadNetwork: limit of 3600 requests per hour reached (rate_limit_exceeded)
E0930 08:01:18.931466       1 route_controller.go:119] Couldn't reconcile node routes: error listing routes: hcloud/ListRoutes: hcloud/reloadNetwork: limit of 3600 requests per hour reached (rate_limit_exceeded)
E0930 08:01:19.040918       1 route_controller.go:119] Couldn't reconcile node routes: error listing routes: hcloud/ListRoutes: hcloud/reloadNetwork: limit of 3600 requests per hour reached (rate_limit_exceeded)
E0930 08:01:19.677038       1 route_controller.go:119] Couldn't reconcile node routes: error listing routes: hcloud/ListRoutes: hcloud/reloadNetwork: limit of 3600 requests per hour reached (rate_limit_exceeded)

Tried to ask support to increase the API limit temporarily but they said no.

ym avatar Sep 30 '22 08:09 ym

Tried to ask support to increase the API limit temporarily but they said no.

@ym can you please the ticket ID to us or reply to this ticket with the explicit mention of my name?

LKaemmerling avatar Sep 30 '22 08:09 LKaemmerling

@LKaemmerling

Thanks, the ticket ID is #2022083103009613

ym avatar Sep 30 '22 08:09 ym

@ym you will get an answer :)

We want to debug this even further. With one of the last releases, we got a contribution that added metrics to all API calls (https://github.com/hetznercloud/hcloud-cloud-controller-manager/pull/303). You should be able to see how often specific endpoints were called by looking at the metrics of the CCM. Can you send us maybe a screenshot from your grafana dashboard - or if possible - send us access to this dashboard via mail to lukas.kaemmerling(at)hetzner-cloud.de ?

LKaemmerling avatar Sep 30 '22 08:09 LKaemmerling

@ym okay you won't get a mail :D i have the honor to say that your limit was just increased :)

LKaemmerling avatar Sep 30 '22 08:09 LKaemmerling

@ym And we apologize for the trouble, because

Sad that Hetzner is not willing to increase the rate limits

we do increase API Limits for various use-cases. In this case the request was unfortunately not forwarded to the responsible department. We already contacted the support to refresh the knowledge of the proper workflow for these requests.

Kjarrigan avatar Sep 30 '22 08:09 Kjarrigan

I'm also currently running into rate limit issues.. Are there any plans to maybe increase limits for endpoints used by this CCM?

Especially when doing maintenance on your cluster (adding nodes, testing nodes, removing nodes, ... ) you'll be rate limited very fast. It's quite annoying tbh.

maaft avatar Nov 01 '22 12:11 maaft

I'm struggling everyday with that ,if I hadn't invested so much time to deploy k8s and all apps in hetzner , the first thing that I would do is to move out . Sorry guys , but thats absolute amateur work here, the worst api service that seen ever.

mmpetarpeshev avatar Nov 01 '22 12:11 mmpetarpeshev

@maaft @mmpetarpeshev could you please try to do what I requested here: https://github.com/hetznercloud/hcloud-cloud-controller-manager/issues/308#issuecomment-1263282005

We need to understand what you cluster is doing :)

LKaemmerling avatar Nov 01 '22 15:11 LKaemmerling

thanks @LKaemmerling will try these days to get these metrics and provide it to you.

mmpetarpeshev avatar Nov 01 '22 16:11 mmpetarpeshev

This issue has been marked as stale because it has not had recent activity. The bot will close the issue if no further action occurs.

github-actions[bot] avatar Jan 01 '23 12:01 github-actions[bot]