cartographer
cartographer copied to clipboard
Warn users when another actor is fighting Cartographer's definition of an object
Description of problem
If an object's spec is changed by the object's controller (or some other automated process on the cluster) Cartographer will fight to control the spec definition. This can prevent Carto from reading, balloon the object's generation, and deadlock the supply chain.
Cartographer creates objects on the cluster, updates them and reads them. Cartographer can only read if the object is in the expected shape, e.g. that the spec is as it was when Carto created it. Cartographer is resilient to changes made at submission time by webhooks. If any other entity (including the object's controller) changes the spec after submission time, that entity should be considered an enemy and users should be notified that such an enemy exists. An example would be kapp App, which will change the formatting of fields on the object during reconciliation.
Proposed solution
Given an object has stamped out in the cluster by a supply chain
When that object's spec is changed by another entity (e.g. by a user kubectl updating the object spec)
Then the workload status contains a warning that an actor in the cluster is changing an object in the supply chain
Implementation proposal
When Cartographer creates an object, it caches both the object submitted to the cluster (the intended object) and what is returned by the apiServer after any mutating webhooks have been applied (the returned object). On following reconciles, Carto decides whether to update. It creates a new intent and checks if it is the same as the old intent. If not, it submits the new intent. If they are the same, it checks if the object currently on the cluster is the same as the cached returned object. If not, it submits the new intent. This second check can be leveraged for this story. If the object has changed since the webhooks touched it, Carto knows that there is another actor that has changed the object. An error/warning should be raised.
In the diagram of the caching strategy below, the star marks the point where Carto can surmise that another actor has changed the object.
Pre-IPM discussion
have we considered declarative "ignores":
ignore:
- spec.syncInterval
template:
spec:
syncInterval: 1m
we always apply it if not currently on the object
spec:
syncInterval: 60s
but afterwards we ignore incoming changes (from the template) ** And we also warn that they differ **
IPM discussion
Given another automated process, there is potential that a warning message would very quickly appear/disappear on the workload. How do we ensure the message is available to a user?
Some thoughts shared:
- Consider kubernetes events
- Aggregate a out-of-sync count in a status with a last-out-of-sync date
Blocking on events rfc. That rfc should help flush out some design.
@idoru do we have stamped object events that would make this obvious?
add note about high resource usage and object age is small and observe generation keeps climbing to troubleshooting guide to wrap it up