Add an `--enable-cached-compositions` feature flag
Description of your changes
Fixes https://github.com/crossplane/crossplane/issues/5338
When this flag is enabled we'll cache *Unstructured objects.
I have:
- [x] Read and followed Crossplane's contribution process.
- [x] Run
make reviewableto ensure this PR is ready for review. - [ ] ~Added or updated unit tests.~
- [x] Added or updated e2e tests.
- [ ] Linked a PR or a docs tracking issue to document this change.
- [ ] ~Added
backport release-x.ylabels to auto-backport this PR.~
Need help with this checklist? See the cheat sheet.
Looks like this fails E2Es pretty consistently. I'd vote we don't try squeeze it into 1.15.
Totally agree, @negz.
Hey @negz I tinkered with this PR and I feel like there might be a problem with cache expiration time with *Unstructured objects or maybe a type problem when we try to get a non *Unstructured type but try to get it from the *Unstructured cache reader. Here is why think like this:
- Github Actions for E2E tests passes the first test
TestCompositionMinimalbut fails the second oneTestCompositionPatchAndTransform. When I ran tests locally generally it passes the first but sometimes this is not the case. So it made me think that maybe the cache expires at an arbitrary time which causes this behavior. - When I tried to run tests one by one using GoLand's test running tool, tests before
TestCompositionFunctionspassed. I also ran tests consecutively usinggo testwith GoLand while running Crossplane as a process to ensure thatEnableAlphaCachedCompositionsisTrueand again the same happened. - When tests fail crossplane's logs almost everytime has a variety of this log so I thought maybe we have some problem with the cache reader types but I am not sure:
2024-04-29T00:12:06+03:00 DEBUG crossplane cannot get claim {"controller":
"offered/compositeresourcedefinition.apiextensions.crossplane.io",
"request": {"name":"xnopresources.nop.example.org"}, "uid": "1d0ca97c-d11f-4153-b360-7c7173172f2e", "version": "1126",
"name": "xnopresources.nop.example.org", "controller": "claim/xnopresources.nop.example.org", "request"":
{"name":"apiextensions-composition-functions","namespace":"default"}, "error": "NopResource.nop.example.org
\"apiextensions-composition-functions\" not found"}
I am still working on this PR to better understand and solve the issue.
Hi @negz I think I understood the problem that is causing our E2E tests to fail. The inconsistent behavior was not because of cache expiration but early read operation.
Investigating the problem first of all I looked at this blog, it was routing to a new solution that is merged to controller-runtime but this was deprecated. Then I looked at cluster-api project as a source and found out that they create different clients for normal and unstructured caching but this approach did not succeeded in our codebase unfortunately.
As in my last comment I looked for any expiration time or cache flush etc. inside controller-runtime codebase but I could not find anything about that. I also tried to increase memory but this was also unsuccesful. Since after we cant find a resource we do not requeue and status stays as 'not found'. We also do not reconcile again if we do not get any events i.e. status stays 'not found' and causing tests to fail. I created this https://github.com/crossplane/crossplane/pull/5644 draft PR that reads from the API server directly if we cant find the resource from the cache.
Per my comment at https://github.com/crossplane/crossplane/pull/5644#discussion_r1589840058 I realized this is way more subtle and complex than I had remembered.
Notably we can't enable caching of Unstructured resources without also finding some way to garbage collect the cache's informers when their GVK no longer exists (e.g. due to their CRD being deleted). If we don't, the informer will live forever complaining that it can't list its GVK.