gloo
gloo copied to clipboard
edge/regression: enable experimental k8s gateway controller
Description
Enable the K8s Gateway controllers to run and not impact existing Gloo Gateway functionality
Context
https://github.com/solo-io/solo-projects/issues/5663#issuecomment-1958102824
pkg/utils/kubeutils
This is a net-new package that is not actually used yet. However, I had found this useful in other work I had done (https://github.com/solo-io/dev-portal/pull/2799), and there are a number of place we use a port-fowarding utility that I figure we could progressively move away from.
StartFunc
Our setup logic inside of setup_syncer is quite complex. Many tasks are executed serially, and others are started asynchronously. We need a simpler mechanism to break these tasks into smaller units. As a result, we define a StartFunc as the minimal function that can be executed with the two relevant inputs that it needs to know about.
We only introduce a single implementation of this funcion to reduce the scope of changes in this PR (see below), but I hope overtime we can more widely adopt this pattern
K8s Gateway Controller StartFunc
The core issue we hit in https://github.com/solo-io/solo-projects/issues/5663#issuecomment-1958102824 was that our call to start the controllers which support the K8s Gateway API, was a blocking call. This prevented the setup loop from being re-run anytime a Setting changed.
The impact of this was that existing Edge regression tests (kube2e) would fail when the k8s gateway controller was running. This was because we would:
- Run the controllers, which would hit the controller.Start function and then block
- Modify a Setting
- The Setting would be updated in etcD
- We would expect that Setting to propagate to the code (BUT IT WOULD NOT)
- We assert a new behavior and the test would fail
The fix was to execute this task asynchronously.
Slack conversation: https://solo-io-corp.slack.com/archives/C06C8RA01NF/p1708619742957649
Interesting decisions
Testing steps
- Enable the experimental API in the regression tests
Notes for reviewers
Some more work that will not be handled in this PR:
- There is more follow up to be done to how we initialize the controller-runtime.Manager (waitForCacheSync)
- There is more follow up to reduce our logging
Checklist:
- [x] I have performed a self-review of my own code
- [x] I have commented my code, particularly in hard-to-understand areas
- [x] I have made corresponding changes to the documentation
- [x] I have added tests that prove my fix is effective or that my feature works
Visit the preview URL for this PR (updated for commit 4cc9a63):
https://gloo-edge--pr9174-sam-combine-binaries-y383ijzv.web.app
(expires Mon, 04 Mar 2024 18:21:18 GMT)
🔥 via Firebase Hosting GitHub Action 🌎
Sign: 77c2b86e287749579b7ff9cadb81e099042ef677
Issues linked to changelog: https://github.com/solo-io/solo-projects/issues/5663
/kick https://storage.googleapis.com/solo-public-build-logs/logs.html?buildid=82a9c9df-c788-440d-93c2-95e58ea89080
I have not seen this failure before, but it seems to be around status propagation on the Proxy
The build-bot failure is a real one. There is some issue with status propagation in e2e tests. This had passed previously, so it seems like I broke something in a recent commit, I will investigate
/kick https://storage.googleapis.com/solo-public-build-logs/logs.html?buildid=659a77d8-6363-4e6a-9950-3c5ee813c16f build-bot error, but no failure. I think this was due to a signaling issue. retrying...