cluster-api icon indicating copy to clipboard operation
cluster-api copied to clipboard

ClusterCache behavior is not fully documented in GoDoc

Open tuminoid opened this issue 4 months ago • 9 comments

After replacing ClusterCacheTracker with ClusterCache in https://github.com/kubernetes-sigs/cluster-api/pull/11247, there isn't any documentation about ClusterCache in the CAPI book nor in repo.

This documentation gap became evident as we got a question in cluster-api-provider-metal3 that why CAPM3 is connecting to cluster-api-provider-proxmox clusters, if both are created in same namespace, but proxmox doesn't connect to CAPM3 clusters. After a bit of digging, we came to conclusion that CAPM3 implementing CAPI ClusterCache is behind this, and since Proxmox provider doesn't implement ClusterCache, it explains why the connection doesn't happen the other way around.

I'm opening this documentation issue for the lack of ClusterCache documentation for the provider developers. Documenting this default behavior plus watch filters, and their expected uses shortly would go a long way here.

/kind documentation

tuminoid avatar Aug 20 '25 10:08 tuminoid

After replacing ClusterCacheTracker with ClusterCache in https://github.com/kubernetes-sigs/cluster-api/pull/11247, there isn't any documentation about ClusterCache in the CAPI book nor in repo.

Just to clarify, there is a lot of godoc documentation in the ClusterCache package and on all related types/options. So it's not true that there is no documentation.

Also to clarify, ClusterCache is not a component that has to be used by providers, similar like other low-level components e.g. in client-go like informers. We only exported the ClusterCache package because it solves a frequent use case. There is no contract or something similar that requires providers to use ClusterCache (or even describes it as a standard practice for providers).

Adding documentation in the CAPI book about ClusterCache in general won't make up for provider documentation if providers want to document their behavior on this level of detail.

Documenting this default behavior

What is "this default behavior" referring to?

(I"m not saying we shouldn't document it, I just want to make sure we have a clear understanding of the component and what could/should be documented in the CAPI book)

sbueringer avatar Aug 20 '25 12:08 sbueringer

Thanks for the quick reply!

After replacing ClusterCacheTracker with ClusterCache in #11247, there isn't any documentation about ClusterCache in the CAPI book nor in repo.

Just to clarify, there is a lot of godoc documentation in the ClusterCache package and on all related types/options. So it's not true that there is no documentation. Referencing it here in case someone else hits this.

OK, this is on me, did not check Go docs as I was multi-tasking fire fighting at the same time. I assumed I would find something in the CAPI book for the provider developers section, or in the repo in markdown format.

Also to clarify, ClusterCache is not a component that has to be used by providers, similar like other low-level components e.g. in client-go like informers. We only exported the ClusterCache package because it solves a frequent use case. There is no contract or something similar that requires providers to use ClusterCache (or even describes it as a standard practice for providers).

Agreed, and this optionality played a part in the original question, causing two providers act differently in the same situation, causing further concern for the end-user what our provider was doing.

Adding documentation in the CAPI book about ClusterCache in general won't make up for provider documentation if providers want to document their behavior on this level of detail.

Documenting this default behavior

What is "this default behavior" referring to?

Default behavior as in CAPM3 implementing ClusterCache will happily connect to non-CAPM3 clusters, if they're created in the same namespace, raising question why CAPM3 is doing that and if that is a security concern that CAPM3 is doing something to the non-CAPM3 clusters.

(I"m not saying we shouldn't document it, I just want to make sure we have a clear understanding of the component and what could/should be documented in the CAPI book)

Basically, the above, followed by a best practice (if it actually is one) to set watch filters to avoid the above. The godoc does not detail any of this in any meaningful detail, btw. That said, I'm not sure if provider developers are just expected to "know this".

Like said, I can totally live without the "formal" docs, and go with the GoDocs, and maybe this could be clarified there instead?

tuminoid avatar Aug 20 '25 13:08 tuminoid

Default behavior as in CAPM3 implementing ClusterCache will happily connect to non-CAPM3 clusters, if they're created in the same namespace, raising question why CAPM3 is doing that and if that is a security concern that CAPM3 is doing something to the non-CAPM3 clusters.

Got it. Yeah that shouldn't happen :)

Basically, the above, followed by a best practice (if it actually is one) to set watch filters to avoid the above. The godoc does not detail any of this in any meaningful detail, btw. That said, I'm not sure if provider developers are just expected to "know this".

Honestly. I didn't even think about the use case of running management clusters with multiple infra providers and what happens if infra providers are using the ClusterCache when implementing the ClusterCache. I'm not sure if watch filters are working, or rather it depends on the labels on the Cluster object, I think. We probably should have a better way to filter the Clusters the ClusterCache is reconciling on (and thus connecting to)

Like said, I can totally live without the "formal" docs, and go with the GoDocs, and maybe this could be clarified there instead?

Let's figure out how to do this correctly. Then definitely document it in godocs and see if we want to add more docs to the book (definitely fine for me)

sbueringer avatar Aug 20 '25 14:08 sbueringer

/retitle ClusterCache behavior is not fully documented in GoDoc

Let's adjust the issue title to be more realistic based on the discussion.

tuminoid avatar Aug 20 '25 14:08 tuminoid

Just an idea for a proper option in ClusterCache.

We can add an option for a Filter func to clustercache.Options the func gets a *clusterv1.Cluster passed in and if the func returns false we skip reconciling the Cluster object.

Infra providers can then decide based on Cluster.spec.infrastructureRef if they want to reconcile the Cluster (i.e. if ClusterCache should create & cache a client for it)

sbueringer avatar Aug 20 '25 14:08 sbueringer

Oh, so you can't use the existing watch filter for that 😊 Yes, it sounds like exactly whats needed to avoid the current behavior and would be easy to implement on the provider side.

WDYT @adilGhaffarDev, @lentzi90 ?

tuminoid avatar Aug 20 '25 14:08 tuminoid

You can use WatchFilterValue, but that implies that all Clusters a provider cares about have a specific label and that's bad :)

I'm not a fan of WatchFilterValue, just added it to ClusterCache because we have it everywhere else

sbueringer avatar Aug 20 '25 15:08 sbueringer

A Filter func sounds great! We can then check the infrastructureRef to decide if we should reconcile it or not. 👍

lentzi90 avatar Aug 21 '25 05:08 lentzi90

Feel free to open a PR. We can figure out minor details on the PR then

sbueringer avatar Aug 21 '25 05:08 sbueringer