kopf One-off run?

Question

DISCLAIMER: I want to start by saying that I'm aware that this is a bit of a crazy question so feel free to tell me as much.

So I created an application using kopf and it works perfectly. However, I have some users that could benefit from it being able to run the handlers for all available events at the moment the operator starts and then die after it's done going through them. i.e. after it reaches an idle state.

I realize that this "run once and die" mode of operation is against the concept of how an operator and control loops work but I figured it'd ask if there's some way in kopf to accomplish such a thing.

-- If possible, explain what other ways did you try to solve the problem? I briefly considered just spawning a timer on a separate thread that will kill the operator after some SECONDS_TO_DIE value but that sounds horribly hacky.

Checklist

[x] I have read the documentation and searched there for the problem
[x] I have searched in the GitHub Issues for similar questions

Keywords

single run
one off
execution duration

Feb 10 '21 21:02 OmegaVVeapon

Maybe kopf cold be run as a CronJob or Job ? :))

Feb 10 '21 21:02 eshepelyuk

Maybe kopf cold be run as a CronJob or Job ? :))

Hahaha, you joke about that but...

The issue is that the Helm chart the users of my app use spawn the app as an initContainer... so I actually need it to do its job and die. Otherwise, the Pod that relies on the initContainer will obviously never start.

Feb 10 '21 21:02 OmegaVVeapon

Not sure I understand about the relation of helm chart and your app. Is the chart external and they include your docker ? Did you create a chart and users utilize it wrong ?etcetc

Feb 10 '21 21:02 eshepelyuk

Is the chart external and they include your docker ?

Exactly this. It's actually the Grafana Helm chart.

It used another image, but it has had issues for half a year and I decided to rewrite it using kopf to make it more stable but I didn't realize they were also using it like this... long story. :(

Anyways, it sounds like I'm barking up the wrong tree here...

I have a few ideas for a workaround that involve dealing with the Helm chart directly (rather than trying to make kopf do this) but I figured I'd ask first in case there was some magical easy way to do what I need before fighting the Grafana Helm chart maintainers.

Feb 10 '21 21:02 OmegaVVeapon

Hey, I understand this is not a proper place to communicate, but what is the exact issue with Grafana helm chart do you experience ?

Feb 10 '21 21:02 eshepelyuk

Yeah, don't really want to derail this question on a tangent but here are the related issues if you're curious:

Feb 10 '21 21:02 OmegaVVeapon

Hmm, is not it better to write custom mutating webhook that would update grafana pod when certain ConfigMaps deleted / created / updated ? AFAIK its pure k8s yaml + docker image, no need for sidecars, operator, watch events, reconciliation loops etc. Init container just need to pick up existing ConfigaMaps on pod start. Sorry I may miss some admission controller understanding?

Feb 10 '21 22:02 eshepelyuk

The biggest challenge is the definition of "idle". Changes can happen all the time, including the time when other changes are being processed — so, even a timeout on do-nothing would not help. But let's assume it is time-based.

I would go this way:

Write a Python script, embed the operator there (https://kopf.readthedocs.io/en/stable/embedding/). Run that script instead of kopf run ....

Start the operator in a thread. Example: https://github.com/nolar/kopf/blob/1.29.2/examples/12-embedded/example.py

Pass it a stop_flag=. (I have to admit here, the docs are not good on embedded — some source code diving is needed: https://github.com/nolar/kopf/blob/1.29.2/kopf/reactor/running.py#L85-L86.) It can be of any of these types, sync or async: https://github.com/nolar/kopf/blob/1.29.2/kopf/structs/primitives.py#L13

In the main thread, check for the "idle" state somehow. Whenever you see that the "idle" state is reached, set/raise that event/future of stop_flag. The operator will gracefully terminate. Just join the thread with the operator.

It can be done the opposite way: run the operator in the main thread and its event loop, and start a side-thread to check for idleness and trigger the flag.

I have checked if it would be easy to implement something like kopf run --once, where it only lists the resources but does not watch them continuously. Turns out, it is somewhat complicated — too many tasks are designed to run continuously. I do not know when and how to stop them gracefully. Especially the central orchestrator of watch-streams.

So, I'll skip this feature for now. Maybe in the future. Please, leave this issue open (as a feature request).

PS: As a side note, a little hint to the app's code: labels/annotations filters can also be callbacks. To match the labels by either just presence or having a specic value (if configured), you can use:

LABEL = get_required_env_var('LABEL')
LABEL_VALUE = os.getenv('LABEL_VALUE')

def label_is_satisfied(val, /, **_):
    return LABEL_VALUE is None or val == LABEL_VALUE

@kopf.on.resume('', 'v1', 'configmaps', when=resource_is_desired, labels={LABEL: label_is_satisfied})
...

However, I'm not sure if I understood all the details properly — I didn't dive deep. Just assumed this logic.

Feb 10 '21 23:02 nolar

Oh, I've missed a big part of the discussion while writing the answer :-)

I agree here: in some cases, a simple for cycle is easier than complicated operators. The easiest operator can be made even with a single for cycle + an API client + a cronjob/time.sleep: https://pykube.readthedocs.io/en/latest/howtos/write-an-operator.html

As you properly mentioned, operators are mostly for continuous run and near-instant reactions.

But it depends on the task, and I am not much aware of it, so my outsider judgements can be wrong.

Feb 10 '21 23:02 nolar

Yeah... I suspected that the definition of "idle" would be the crux of this issue (events are flowing all the time... when are we truly "done"?).

Thanks for the pointer about the embedding, hadn't considered that as an option to this issue. Although... it sounds like it might be a bit of a rabbit hole... Perhaps I could follow what you said about writing a simple Python script and then running the operator in a thread if we need to "run forever" and spawn a different thread for the "one off" use case codepath... I'm not sure. Will need to think a bit more if such a refactor would be worth it.

It's a shame this use-case came up tbh since kopf has been exceptional at "staying alive" ever since that other hint you gave me in https://github.com/nolar/kopf/issues/585. I've had it running for nearly 2 months and the handlers still respond as quickly as they did at the start!

P.S. thanks for the label_is_satisfied hint!

Feb 10 '21 23:02 OmegaVVeapon

Sorry for jumping in again, but let me ask smth more. I assume that you trying to retrieve existing ConfigMaps on Grafana pod start, to perform its initial configuration. Have you been thinking about using Helm's lookup function ? Unfortunately it doesn't support filtering by label, only by namespaces. But having in mind that Grafana resides in own NS and all ConfigMaps located in the same NS - the iterating and filtering in helm template may be acceptable. Wdyt ?

Feb 11 '21 06:02 eshepelyuk

Sorry for jumping in again, but let me ask smth more. I assume that you trying to retrieve existing ConfigMaps on Grafana pod start, to perform its initial configuration. Have you been thinking about using Helm's lookup function ? Unfortunately it doesn't support filtering by label, only by namespaces. But having in mind that Grafana resides in own NS and all ConfigMaps located in the same NS - the iterating and filtering in helm template may be acceptable. Wdyt ?

That's an interesting idea, unfortunately the ConfigMap assumption doesn't hold true IRL.

At least in our company, the infra team deploys grafana + prometheus in a monitoring namespace. Meanwhile, other teams each have their own namespace. auth, frontend, backend, etc. They all have Grafana dashboard ConfigMaps in their own namespaces (they don't have RBAC permissions to deploy to monitoring).

So it's important for w/e mechanism is used to gather Grafana dashboards/datasources/notifiers to be able to gather ConfigMaps and Secrets across multiple namespaces. I'd imagine other Grafana users probably have similar needs.

Feb 11 '21 14:02 OmegaVVeapon

Thanks for patience, sorry for jumping into conversation.

Feb 11 '21 14:02 eshepelyuk

kopf kopf copied to clipboard

One-off run?

Question

Checklist

Keywords

kopf
kopf copied to clipboard