metacontroller icon indicating copy to clipboard operation
metacontroller copied to clipboard

Meta controller doesn't include current status in webhook payload

Open luisdavim opened this issue 5 years ago • 8 comments

Hi,

I was hoping to be able to keep some state within the status part of the resource and retrieving it within the hook calback with something like:

cnameRef := gjson.Get(string(body), "parent.status.cnameRef").String()

but there's no status under parent is this by design?

luisdavim avatar Mar 26 '19 15:03 luisdavim

After some more investigation it's even stranger, the first two requests don't have a state under parent but the following ones do, not having state on the first one is expected but on the next on I was relying on this flag to be set in the state and it's not there so I try to generate a new ref, because the object being referenced already exists it fails and I end up with an empty ref so my status has the correct ref for a few seconds and then it's wiped out because I got a request from metacontroller without state after the state was already initialised.

luisdavim avatar Mar 26 '19 16:03 luisdavim

Probably, because of your metacontroller built without concurrency in mind, two or more request can be received at the same time to webhook handler about updates, so, maybe in one case you return one status and in another alternative status, then you need to make sure, that status is set based on what exists and not expected. Think of it as finite state machine. Correct me if I am wrong. Earlier that solves me a lot of problems. Also recommend you to not use any kind of random generated values if possible, otherwise make sure you know what you are doing.

moredure avatar Mar 26 '19 22:03 moredure

But why would I get two sync requests for the same resource at the same time? The problem here is that I only get tha ref when I create a resource in an external system that's why I need to keep it.

luisdavim avatar Mar 26 '19 23:03 luisdavim

Metacontroller as I know, worked in multithreaded way (5 worker goroutines, it's the last number I remembered), so this situation is possible if two+ changes made quickly, handler can processed it almost simultaneously thats why, possibly, race condition took place, status updated some kind of incorrect, but that is on your shoulders, to solve such kind of race conditions. I am not a maintainer, but that is what I came up working with metacontroller.

moredure avatar Mar 27 '19 20:03 moredure

Yeah but I still think this is a bug, in this case I can ensure there where no concurrent changes, it's a Dev environment and the only action was creating one single instance of the CRD, also I'm seeing duplicate requests from metacontroller from the periodic rsync. Without any action on the custom resource. Every 60s, the configured refresh interval I get 2 requests from metacontroller and there's only one resource

luisdavim avatar Mar 28 '19 19:03 luisdavim

BTW I did manage to overcome the issue on the hook side but this could still become a scalability issue if metacontroller is doing more requests than it should.

luisdavim avatar Mar 28 '19 19:03 luisdavim

so this situation is possible if two+ changes made quickly, handler can processed it almost simultaneously

Although there are multiple worker threads, the queue is deduplicated based only on the name of the parent object the needs to be synced (not based on what changed), so Metacontroller will not send concurrent hook requests about the same parent.

For the original issue: In general, when using k8s watches/informers, as Metacontroller does, you cannot be sure that on the next sync you will observe the changes you made on the last one. You might sync again from the local cache (because some child changed, for example) before your local cache receives the update (it is not write-through). Core controllers are designed to always look at the state they see and monotonically make progress towards the desired state, which means the lag doesn't matter.

It can be difficult to be disciplined about that, though, and sometimes it requires creativity to design around that constraint. It's probably worth thinking about whether Metacontroller can help alleviate that.

For this particular case, the pattern core controllers use would look like:

  1. Generate the reference in a way that's deterministic from the contents of the sync hook (parent/child specs).
  2. Try to create the reference in a way that's idempotent (it's a no-op if it already exists).
  3. If the no-op creation returns an error that is specifically "already exists", consider that as success: the thing we want to exist, exists. Report success in status.

enisoc avatar Apr 03 '19 00:04 enisoc

I think I may be hitting this same issue: #201

I would add an additional perspective is that this doesn't work well for managing external resources for a couple of reasons:

  • If an external resource already exists it is one of two outcomes:
    • It was created by this K8s resource -> success
    • It was created by another resource in K8s (different K8s name/namespace, same external resource identifier) -> duplicate, failure
  • Assuming you don't solve for the first case, and you let multiple K8s resource point to the same external resource (success case), you get into a scenario where cleaning them up properly in the finalizer is challenging/impossible. Do you only clean it up if both resources in K8s are deleted? Or on the first one? etc.

kelly-sm avatar May 26 '20 21:05 kelly-sm