cartographer icon indicating copy to clipboard operation
cartographer copied to clipboard

Warn users when another actor is fighting Cartographer's definition of an object

Open waciumawanjohi opened this issue 3 years ago • 5 comments

Description of problem

If an object's spec is changed by the object's controller (or some other automated process on the cluster) Cartographer will fight to control the spec definition. This can prevent Carto from reading, balloon the object's generation, and deadlock the supply chain.

Cartographer creates objects on the cluster, updates them and reads them. Cartographer can only read if the object is in the expected shape, e.g. that the spec is as it was when Carto created it. Cartographer is resilient to changes made at submission time by webhooks. If any other entity (including the object's controller) changes the spec after submission time, that entity should be considered an enemy and users should be notified that such an enemy exists. An example would be kapp App, which will change the formatting of fields on the object during reconciliation.

Proposed solution

Given an object has stamped out in the cluster by a supply chain
When that object's spec is changed by another entity (e.g. by a user kubectl updating the object spec)
Then the workload status contains a warning that an actor in the cluster is changing an object in the supply chain

Implementation proposal

When Cartographer creates an object, it caches both the object submitted to the cluster (the intended object) and what is returned by the apiServer after any mutating webhooks have been applied (the returned object). On following reconciles, Carto decides whether to update. It creates a new intent and checks if it is the same as the old intent. If not, it submits the new intent. If they are the same, it checks if the object currently on the cluster is the same as the cached returned object. If not, it submits the new intent. This second check can be leveraged for this story. If the object has changed since the webhooks touched it, Carto knows that there is another actor that has changed the object. An error/warning should be raised.

In the diagram of the caching strategy below, the star marks the point where Carto can surmise that another actor has changed the object.

Kontinue Object Caching (2)

waciumawanjohi avatar Feb 04 '22 19:02 waciumawanjohi

Pre-IPM discussion

have we considered declarative "ignores":

ignore:
  - spec.syncInterval
template:
  spec:
    syncInterval: 1m

we always apply it if not currently on the object

spec:
  syncInterval: 60s

but afterwards we ignore incoming changes (from the template) ** And we also warn that they differ **

squeedee avatar Feb 11 '22 19:02 squeedee

IPM discussion

Given another automated process, there is potential that a warning message would very quickly appear/disappear on the workload. How do we ensure the message is available to a user?

Some thoughts shared:

  • Consider kubernetes events
  • Aggregate a out-of-sync count in a status with a last-out-of-sync date

zrob avatar Feb 14 '22 17:02 zrob

Blocking on events rfc. That rfc should help flush out some design.

emmjohnson avatar Mar 21 '22 16:03 emmjohnson

@idoru do we have stamped object events that would make this obvious?

karayim avatar Oct 07 '22 18:10 karayim

add note about high resource usage and object age is small and observe generation keeps climbing to troubleshooting guide to wrap it up

karayim avatar Oct 10 '22 18:10 karayim