Add route_controller to linode CCM

Open rahulait opened this issue 2 years ago • 0 comments

General:

This PR reintroduces changes which were reverted in https://github.com/linode/linode-cloud-controller-manager/pull/196

Additional changes added to PR:

If we are running outside of VPC, we don't fetch InstanceConfigs
If we are running within VPC, we fetch InstanceConfigs for instances within the VPC

Note: In future, we would like to use /v4/vpcs/:id/ips to list all VPC ips so that we can get all instance's ips in just 2 calls than existing 1+len(nodes) calls.

Calls made every 20 secs in a 6 node cluster inside VPC (started CCM with linodego-debug flag set to true):

k logs --timestamps pod/ccm-linode-cbxwl -n kube-system -f | grep "/v4/"

2024-03-27T13:50:35.520144953Z GET  /v4/linode/instances  HTTP/1.1
2024-03-27T13:50:35.599625368Z GET  /v4/vpcs/41220  HTTP/1.1
2024-03-27T13:50:35.691136075Z GET  /v4/linode/instances/56323907/configs  HTTP/1.1
2024-03-27T13:50:35.696009496Z GET  /v4/linode/instances/56324011/configs  HTTP/1.1
2024-03-27T13:50:35.700781396Z GET  /v4/linode/instances/56323949/configs  HTTP/1.1
2024-03-27T13:50:35.701410791Z GET  /v4/linode/instances/56323970/configs  HTTP/1.1
2024-03-27T13:50:35.705268044Z GET  /v4/linode/instances/56323968/configs  HTTP/1.1
2024-03-27T13:50:35.727128327Z GET  /v4/linode/instances/56323974/configs  HTTP/1.1

2024-03-27T13:50:55.519257954Z GET  /v4/linode/instances  HTTP/1.1
2024-03-27T13:50:55.623009604Z GET  /v4/vpcs/41220  HTTP/1.1
2024-03-27T13:50:55.708633682Z GET  /v4/linode/instances/56324011/configs  HTTP/1.1
2024-03-27T13:50:55.721245148Z GET  /v4/linode/instances/56323970/configs  HTTP/1.1
2024-03-27T13:50:55.724830978Z GET  /v4/linode/instances/56323907/configs  HTTP/1.1
2024-03-27T13:50:55.737507344Z GET  /v4/linode/instances/56323974/configs  HTTP/1.1
2024-03-27T13:50:55.743463294Z GET  /v4/linode/instances/56323949/configs  HTTP/1.1
2024-03-27T13:50:55.751982925Z GET  /v4/linode/instances/56323968/configs  HTTP/1.1

When running outside of VPC, we don't have route-controller enabled and hence we don't get instance cache refreshes that often.

root@rah4-control-plane-4qjf5:~# k logs --timestamps pod/ccm-linode-v77kt -n kube-system -f | grep "/v4/"
2024-03-27T13:53:48.297701175Z GET  /v4/linode/instances  HTTP/1.1
2024-03-27T13:53:48.386284452Z GET  /v4/linode/instances  HTTP/1.1
2024-03-27T13:58:48.575691503Z GET  /v4/linode/instances  HTTP/1.1
2024-03-27T14:03:48.757060253Z GET  /v4/linode/instances  HTTP/1.1
2024-03-27T14:08:48.909781164Z GET  /v4/linode/instances  HTTP/1.1
2024-03-27T14:13:49.284562219Z GET  /v4/linode/instances  HTTP/1.1

Since route_controller gets triggered every 10 seconds and instance cache is valid for 15 seconds, each second iteration of route_controller (2*10) finds the instance cache expired and causes it to refresh. I would prefer to change route_controller's default triggering interval from 10 seconds to 60 seconds as its running too frequently and doing unnecessary work. Even if a node doesn't get its routes updated for 60 seconds when it joins the cluster, its fine and pods will come up fine, just that they won't be able to communicate with pods on other nodes for 60 seconds or so for the first time should be fine? I am open to discussions about it and come to some conclusion on what time we should set for route-reconciliation-period for route_controller.

[x] Have you removed all sensitive information, including but not limited to access keys and passwords?
[x] Have you checked to ensure there aren't other open or closed Pull Requests for the same bug/feature/question?

Pull Request Guidelines:

[x] Does your submission pass tests?
[x] Have you added tests?
[x] Are you addressing a single feature in this PR?
[x] Are your commits atomic, addressing one change per commit?
[x] Are you following the conventions of the language?
[x] Have you saved your large formatting changes for a different PR, so we can focus on your work?
[x] Have you explained your rationale for why this feature is needed?
[ ] Have you linked your PR to an open issue

Mar 26 '24 14:03 rahulait