Add route_controller to linode CCM
General:
This PR reintroduces changes which were reverted in https://github.com/linode/linode-cloud-controller-manager/pull/196
Additional changes added to PR:
- If we are running outside of VPC, we don't fetch InstanceConfigs
- If we are running within VPC, we fetch InstanceConfigs for instances within the VPC
Note: In future, we would like to use /v4/vpcs/:id/ips to list all VPC ips so that we can get all instance's ips in just 2 calls than existing 1+len(nodes) calls.
Calls made every 20 secs in a 6 node cluster inside VPC (started CCM with linodego-debug flag set to true):
k logs --timestamps pod/ccm-linode-cbxwl -n kube-system -f | grep "/v4/"
2024-03-27T13:50:35.520144953Z GET /v4/linode/instances HTTP/1.1
2024-03-27T13:50:35.599625368Z GET /v4/vpcs/41220 HTTP/1.1
2024-03-27T13:50:35.691136075Z GET /v4/linode/instances/56323907/configs HTTP/1.1
2024-03-27T13:50:35.696009496Z GET /v4/linode/instances/56324011/configs HTTP/1.1
2024-03-27T13:50:35.700781396Z GET /v4/linode/instances/56323949/configs HTTP/1.1
2024-03-27T13:50:35.701410791Z GET /v4/linode/instances/56323970/configs HTTP/1.1
2024-03-27T13:50:35.705268044Z GET /v4/linode/instances/56323968/configs HTTP/1.1
2024-03-27T13:50:35.727128327Z GET /v4/linode/instances/56323974/configs HTTP/1.1
2024-03-27T13:50:55.519257954Z GET /v4/linode/instances HTTP/1.1
2024-03-27T13:50:55.623009604Z GET /v4/vpcs/41220 HTTP/1.1
2024-03-27T13:50:55.708633682Z GET /v4/linode/instances/56324011/configs HTTP/1.1
2024-03-27T13:50:55.721245148Z GET /v4/linode/instances/56323970/configs HTTP/1.1
2024-03-27T13:50:55.724830978Z GET /v4/linode/instances/56323907/configs HTTP/1.1
2024-03-27T13:50:55.737507344Z GET /v4/linode/instances/56323974/configs HTTP/1.1
2024-03-27T13:50:55.743463294Z GET /v4/linode/instances/56323949/configs HTTP/1.1
2024-03-27T13:50:55.751982925Z GET /v4/linode/instances/56323968/configs HTTP/1.1
When running outside of VPC, we don't have route-controller enabled and hence we don't get instance cache refreshes that often.
root@rah4-control-plane-4qjf5:~# k logs --timestamps pod/ccm-linode-v77kt -n kube-system -f | grep "/v4/"
2024-03-27T13:53:48.297701175Z GET /v4/linode/instances HTTP/1.1
2024-03-27T13:53:48.386284452Z GET /v4/linode/instances HTTP/1.1
2024-03-27T13:58:48.575691503Z GET /v4/linode/instances HTTP/1.1
2024-03-27T14:03:48.757060253Z GET /v4/linode/instances HTTP/1.1
2024-03-27T14:08:48.909781164Z GET /v4/linode/instances HTTP/1.1
2024-03-27T14:13:49.284562219Z GET /v4/linode/instances HTTP/1.1
Since route_controller gets triggered every 10 seconds and instance cache is valid for 15 seconds, each second iteration of route_controller (2*10) finds the instance cache expired and causes it to refresh. I would prefer to change route_controller's default triggering interval from 10 seconds to 60 seconds as its running too frequently and doing unnecessary work. Even if a node doesn't get its routes updated for 60 seconds when it joins the cluster, its fine and pods will come up fine, just that they won't be able to communicate with pods on other nodes for 60 seconds or so for the first time should be fine? I am open to discussions about it and come to some conclusion on what time we should set for route-reconciliation-period for route_controller.
- [x] Have you removed all sensitive information, including but not limited to access keys and passwords?
- [x] Have you checked to ensure there aren't other open or closed Pull Requests for the same bug/feature/question?
Pull Request Guidelines:
- [x] Does your submission pass tests?
- [x] Have you added tests?
- [x] Are you addressing a single feature in this PR?
- [x] Are your commits atomic, addressing one change per commit?
- [x] Are you following the conventions of the language?
- [x] Have you saved your large formatting changes for a different PR, so we can focus on your work?
- [x] Have you explained your rationale for why this feature is needed?
- [ ] Have you linked your PR to an open issue