cluster-api icon indicating copy to clipboard operation
cluster-api copied to clipboard

Improve compatibility with kstatus: avoid "resource is ready" race

Open tmmorin opened this issue 1 year ago • 8 comments

What would you like to be added (User Story)?

There is a low-hanging fruit of something easy to do to have CAPI resources play much nicer with tools relying on kstatus to know if a CAPI resource is ready.

Detailed Description

CAPI custom resources can be considered "Ready" by kstatus library before they actually are ready.

In a context where a tool relying on kstatus is used (e.g FluxCD) this opens the door to inconsistencies: wrongly concluding that something is ready and triggering things that depend on that too early.

This typically happens very shortly after resource creation and for a very short period of time. And this resolves very quickly. But there is a race condition if the tool using kstatus is used during the problematic time window.

The problematic time is when, for instance, a Cluster CR has no status yet, or when it has status but no status.conditions yet, with only the status.observedGeneration being set to 1 (equal to metadata.generation). As soon as the resource is processed by its controller, the controller will set its status to include in status.conditions a condition of type Ready and status False , and then kstatus will report a correct result (InProgress which means "ready").

The typical solution to this issue is to ensure that the CRD defines a default of -1 for status.observedGeneration - this is sufficient to let kstatus library ignore the rest and conclude that the resource isn't ready yet.

Quoting @stefanprodan (FluxCD dev):

They should set this to make ClusterAPI compatible // +kubebuilder:default={"observedGeneration":-1}

Example here (FluxCD does this on their own CRDs): https://github.com/fluxcd/source-controller/blob/a302c71c57e370403042a2e307e3f4446b539730/api/v1/gitrepository_types.go#L328

Anything else you would like to add?

#5472 has been opened a while ago, is strongly related, but has a much generic/wider scope than what I described her which is focusing only on having the InProgress/ready state free of this resource creation race condition.

Label(s) to be applied

/kind feature /area api

tmmorin avatar Aug 24 '23 14:08 tmmorin