vm-operator icon indicating copy to clipboard operation
vm-operator copied to clipboard

✨ Restrict cache for all ConfigMap/Secret objects

Open akutz opened this issue 1 year ago • 3 comments

What does this PR do, and why is it needed?

This patch updates the manager cache for VM Operator so that ConfigMap/Secret resources from the kube-system and VM Op pod namespaces are cached, but ConfigMap/Secret resources in any other namespace are not cached.

This patch means controllers that access ConfigMap/Secret resources in these namespaces no longer need to create separate caches. Instead, all controllers may use the manager client unless for some other reason.

Which issue(s) is/are addressed by this PR? (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):

Fixes NA

Are there any special notes for your reviewer:

The tests in controllers/infra/secret/infra_secret_controller_intg_test.go related to checking the value of called validates the expected behavior.

Please add a release note if necessary:

Restrict caching of ConfigMap and Secret resources to kube-system and the VM Operator pod namespaces.

akutz avatar Sep 24 '24 15:09 akutz

@sbueringer Ugh. This is failing with:

$ go test -timeout 120s -run '^TestManager$' ./pkg/manager/test
Running Suite: Manager Suite - /Users/akutz/Projects/vmop/vmop/pkg/manager/test
===============================================================================
Random Seed: 1727195109

Will run 3 of 3 specs
------------------------------
• [FAILED] [0.022 seconds]
Integration tests Cache Getting objects ConfigMap [It] should return the object with a live read [envtest]
/Users/akutz/Projects/vmop/vmop/pkg/manager/test/cache_test.go:57

  Timeline >>
  STEP: Creating a temporary namespace @ 09/24/24 11:25:15.684
  [FAILED] in [It] - /Users/akutz/Projects/vmop/vmop/pkg/manager/test/cache_test.go:58 @ 09/24/24 11:25:15.702
  STEP: Destroying temporary namespace @ 09/24/24 11:25:15.702
  << Timeline

  [FAILED] Expected success, but got an error:
      <*errors.errorString | 0x14000e06050>: 
      unable to get: 2c57ba0a-9e3c-44eb-8721-b68a1bf4f65c/my-object-7kbwc because of unknown namespace for the cache
      {
          s: "unable to get: 2c57ba0a-9e3c-44eb-8721-b68a1bf4f65c/my-object-7kbwc because of unknown namespace for the cache",
      }
  In [It] at: /Users/akutz/Projects/vmop/vmop/pkg/manager/test/cache_test.go:58 @ 09/24/24 11:25:15.702
------------------------------
SSI0924 11:25:15.707798   99028 internal.go:538] "Stopping and waiting for non leader election runnables"
I0924 11:25:15.707837   99028 internal.go:542] "Stopping and waiting for leader election runnables"
I0924 11:25:15.707857   99028 internal.go:550] "Stopping and waiting for caches"
I0924 11:25:15.707954   99028 internal.go:554] "Stopping and waiting for webhooks"
I0924 11:25:15.707967   99028 internal.go:557] "Stopping and waiting for HTTP servers"
I0924 11:25:15.707977   99028 internal.go:561] "Wait completed, proceeding to shutdown the manager"


Summarizing 1 Failure:
  [FAIL] Integration tests Cache Getting objects ConfigMap [It] should return the object with a live read [envtest]
  /Users/akutz/Projects/vmop/vmop/pkg/manager/test/cache_test.go:58

Ran 1 of 3 Specs in 7.299 seconds
FAIL! -- 0 Passed | 1 Failed | 0 Pending | 2 Skipped
--- FAIL: TestManager (7.30s)
FAIL
FAIL	github.com/vmware-tanzu/vm-operator/pkg/manager/test	8.053s
FAIL

It seems that controller-runtime does not fall-through to the API reader / bypass cache if any namespace mappings are present for an object.

Is there not any way we can utilize ByObject or Disable to:

  • Disable caching for ConfigMap and Secret resources for all namespaces except those we bless.
  • Do the above using the default manager client/cache.
  • Not need to set up special caches for controllers that need to watch/cache ConfigMap/Secret resources in the allowed namespaces.

Thanks!

akutz avatar Sep 24 '24 16:09 akutz

It seems that controller-runtime does not fall-through to the API reader / bypass cache if any namespace mappings are present for an object.

That is correct.

Some context:

  • Via cache.Options you can configure what exactly should be cached (including which namespaces for which gvk etc.)
  • Via client.Options you can configure for which requests the client should use the cache vs live clients:
    • Can be configured via either DisableFor (for entire GVKs, not per namespace), or Unstructured (but that's not your case)
    • This is implemented here: https://github.com/kubernetes-sigs/controller-runtime/blob/4a356a8b1a22aafe55d07d067e935dc64fb90919/pkg/client/client.go#L335
  • Once the client decided to either use the cache or the live clients this choice is final. There is no "fallback" e.g. if the Cache returns a NotFound. I assume this is because you don't want to end up spaming the apiserver when an object actually doesn't exist

Potential solutions

  • I think your cache config is correct and there is no way to configure it better. It's definitely not needed to set up separate caches for ConfigMaps and Secrets.
  • I think the key is in solving this at the "client-level". A few options:
    1. Set DisableFor for the mgr.GetClient() so the mgr.GetClient() doesn't use the cache for ConfigMaps and Secrets. Create an additional client for the cases where you want to cache. We did this here in CAPI: https://github.com/kubernetes-sigs/cluster-api/blob/3232abcf390f5364d4ebfc4ce7c2b5fac0281686/main.go#L389-L398
    2. The reverse, mgr.GetClient() uses the Cache for all ConfigMaps/Secrets. Then you have to explicitly use a live client (e.g. mgr.GetAPIReader()) in the cases where you want to use a live client for ConfigMaps/Secrets
    3. Build a Client that internally decides in which cases a live client or the cache should be used. This would be probably a thin wrapper around mgr.GetClient(). One easy way to do this is to use interceptor.NewClient, overwrite Get & List and add some if/else based on namespace.
    4. Ideal mid-term option would be to extend client.CacheOptions.DisableFor to allow more fine-granular configuration when the cache should be used

sbueringer avatar Sep 25 '24 11:09 sbueringer

First of all, thank you @sbueringer!

  • I think the key is in solving this at the "client-level". A few options:

    1. Set DisableFor for the mgr.GetClient() so the mgr.GetClient() doesn't use the cache for ConfigMaps and Secrets. Create an additional client for the cases where you want to cache. We did this here in CAPI: https://github.com/kubernetes-sigs/cluster-api/blob/3232abcf390f5364d4ebfc4ce7c2b5fac0281686/main.go#L389-L398
    2. The reverse, mgr.GetClient() uses the Cache for all ConfigMaps/Secrets. Then you have to explicitly use a live client (e.g. mgr.GetAPIReader()) in the cases where you want to use a live client for ConfigMaps/Secrets
    3. Build a Client that internally decides in which cases a live client or the cache should be used. This would be probably a thin wrapper around mgr.GetClient(). One easy way to do this is to use interceptor.NewClient, overwrite Get & List and add some if/else based on namespace.
    4. Ideal mid-term option would be to extend client.CacheOptions.DisableFor to allow more fine-granular configuration when the cache should be used

Ironically, this is exactly what we used to do prior to CR introducing the DisableFor feature. We had a client proxy that decided which resources used the cache versus uses the reader.

Okay, I guess we need to go back to that then. Thanks again @sbueringer !

akutz avatar Sep 25 '24 12:09 akutz