cartographer icon indicating copy to clipboard operation
cartographer copied to clipboard

R4RFC: Allow resources to report status

Open jwntrs opened this issue 2 years ago • 4 comments

Description of problem

Currently a SupplyChain has no knowledge of the current status of any of its underlying resources. There have been many attempts to document how we could let resources report this information, but most of these solutions also try to use this information to gate downstream behaviour. We are requesting an RFC that proposes a solution for how resources can report (and categorize) this information strictly for informational purposes. The goal would be to let templates indicate what state the resource might be in (ready, failing, etc) so that we can relay that information to the workload status.

Prior art

https://github.com/vmware-tanzu/cartographer/pull/556 https://github.com/vmware-tanzu/cartographer/issues/28 https://github.com/vmware-tanzu/cartographer/issues/162 https://github.com/vmware-tanzu/cartographer/issues/247

jwntrs avatar Mar 04 '22 21:03 jwntrs

Looking for stampedResource.status

emmjohnson avatar Mar 21 '22 15:03 emmjohnson

What do we want to put in the status? Have a template specify where to get the status for the object? Put the entire status block in? Determine if it is ready or failing?

emmjohnson avatar Mar 21 '22 15:03 emmjohnson

(from zoom conversation)

w/ regards to reporting whether something is good/not, it seems like clusterapi's MachineHealthCheck could loosely bring some inspiration (https://cluster-api.sigs.k8s.io/tasks/healthcheck.html#what-is-a-machinehealthcheck) - there you have this CRD where you get to essentially say _"for machines labelled as such, this is how you know whether they're healthy/not"

cirocosta avatar Mar 21 '22 15:03 cirocosta

"status" is a bit overloaded. I don't think a stamped resource should be able to have its full/partial .status reflected blindly on the workload, but I do like the idea of each stamped resource having a notion of health. For many resources, this will be the Ready condition, which has a status field of (True, False, Unknown). There are a few considerations to take into account:

  • not all resources have status (e.g. ConfigMap)
  • not all resources with status have conditions (e.g. Service)
  • not all resource with conditions have a Ready condition (e.g. Deployment)
  • not all conditions status True means the resource is healthy, False can be good (e.g. Node)

Part of my (counter) proposal on RFC 18 was to include condition-like fields for each stamped resource. reason, message, status, lastTransitionTime

scothis avatar Mar 21 '22 16:03 scothis