operator
operator copied to clipboard
Requesting clarification of docstring on event.defer()
Issue
It will be useful if the docstrings of event.defer()
explicitly emphasise
- That
event.defer()
should not be used to order or sequence the execution of a charms event handler methods. - That events triggered later in time after a deferred event may start and complete their execution even while the deferred event remains deferred.
These two points essential imply that a Charm has no means to enforce any ordering on the execution of event handlers beyond what is inherent in the Juju event model. This I think is a key concept and may be worth emphasising even in our sdk docs.
Even though the doc strings may be alluding to these facts, it is easily possible to be misled by a statement such as "any deferred events get processed before the event (or action) that caused the current invocation of the charm."
On second thoughts I would like to make a correction/qualification to my own statements above.
The statement I would like to qualify is "a Charm has no means to enforce any ordering on the execution of event handlers beyond what is inherent in the Juju event model." I think we should emphasise any "inherent" ordering of events by Juju is only valid if the none of the event handlers in a charm invokes a defer()
.
There is another subtlety about defer()
when used as an overarching pattern, that I think is worth mentioning in the docs:
[If many] startup event hooks use defer if a precondition is not met, [then] the charm may end up in stagnation until the next update-status. If the default update-status hook interval of 5min is kept by the user, then relying on update-status is not too bad, but if the user creates a controller or model with a default of, say, 60 min, then it may be too long. -- cos-config/12
Yeah - it's definitely worth mentioning that events deferred at certain app lifecycle times may not see any events other than update-status for a long time.
Having a charm's lifecycle progression depending on the periodic occurrence of update status event is a bit worrying though, not in the least because the periodicity can be rather long. However I agree given the current semantics of defer()
this could become a necessary evil for the following reasons.
- If any event in a charm invokes a
defer()
then any sub-sequent event whose execution is pre-conditioned upon the successful completion of the deferred event, may itself need to invoke adefer()
or risk out of order execution. - If an early event in the charm lifecycle (such as the install or start event) invokes a defer, then all handlers for all subsequent events in the charm lifecycle will also need to have a
defer()
statement after checking for the outcome status of the antecedent events they are preconditioned upon. - As a result, progress of the charms lifecycle will get locked into the update event's periodicity.
This would be the case for charms managing any application where there is a prescribed order of events that need to happen in deploying or managing the application workloads particularly because application workloads need not respond instantaneously to requested changes in behaviour.
As a general rule - you shouldn't assume anything has happened previously inside charm code. Always check/confirm your priors. Do you think pebble is running in the workload because pebble_ready ran previously? - probably should check container.can_connect anyway, etc.
Thinking about this ...
In machine charms, I might expect the following code in an install hook:
def _on_install(self, event):
try:
apt.install("some_package")
except AptError as e:
self.model.unit.status = BlockedStatus("Failed to install: {}".format(e))
The call to the apt library (in the operator-libs-linux) blocks while the package is being installed. Given that we are already blocking, it's not necessarily horrible practice to do something like:
while True:
started = self._is_my_service_started()
if started: break
time.sleep(0.1)
(I know that this contradicts some earlier statements on my part.)
The problem is that you have no chance to set your status to inform the operator that they need to wait, because the framework won't commit
your changes until it is done executing a hook. This is somewhat okay in install
, where the status is set to Maintenance
anyway. But it is very much not okay in a config-changed
handler, where the status might be Active
, but the charm is secretly in Maintenance
while it awaits the service being ready.
(It's not a great pattern in install, either. I really would prefer to call .defer, and set MaintenanceStatus("Waiting for service to start"
)
In sidecar charms, you can probably modify your container entry point so that it blocks pebble until the service is ready, which gives you a nice pebble-ready
event you can react to. But that's not a universal solution.
We really need workload events :-/
Maybe we need a way to "commit" a status change mid-hook... Has that been considered/discussed before?
Maybe we need a way to "commit" a status change mid-hook... Has that been considered/discussed before?
I don't think that this would play nicely with the Juju agent, which loads, runs, and then commits a hook event.
I think that it might be possible to .emit
a new event, rather than deferring, which would be interesting. I need to go and check my understanding of the internals, but I think that the following would work:
def _on_config_changed(self, event):
if not self._liveness_check():
status = MaintenanceStatus("Awaiting liveness check")
if self.model.unit.status == status:
# Sleep on the second and later times through.
time.sleep(1)
self.model.unit.status = status
self.on.config_changed.emit()
return
This will work if the Juju agent is processing Ops events as standalone hooks, which I think that it does (otherwise, deferring an install
hook and then attempting to execute it in a relation-changed
hook's context would be awkward).
Hmm... that doesn't seem like a very clean solution either. Perhaps it's worth making a separate issue to capture this idea of managing status changes generally...
The syntax of the above routine doesn't have to be awful for charm authors. I included everything inline for clarity. The final syntax might be:
if not self._liveness_check():
event.defer_with_retry(msg="Awaiting liveness check", retry_after=1)
return
The event object has access to framework.model, as well as its own name, so setting the status and emitting the event can all happen inside .defer_with_retry.
We could even wrap everything in the existing defer. The new function signature might look like:
def defer(
self,
message: Str = "Event {event_name} deferred",
status: StatusBase = MaintenanceStatus,
retry_after: Int = 0
):
Queuing the sleep off of the status message is a little bit delicate, but in most cases, the failure mode is just a sleep gets skipped. If we want to guard against the case where two deferred events clobber each others' status while pegging the CPU, we could hijack the snapshot
and restore
methods to track how many times the event has fired. (That would allow us to implement exponential backoff, too, which would be nice.)
Ultimately, the "correct" solution is to have workload events. If we can do something relatively lightweight that tides folks over until that feature is implemented, it might make sense to do so, however.
What do you mean by "workload events" - is this something that has been discussed before/elsewhere?
@balbirthomas Trying to close this issue... Let's go back to the event.defer() docstring. It currently is:
"""Defer the event to the future.
Deferring an event from a handler puts that handler into a queue, to be
called again the next time the charm is invoked. This invocation may be
the result of an action, or any event other than metric events. The
queue of events will be dispatched before the new event is processed.
From the above you may deduce, but it's important to point out:
* ``defer()`` does not interrupt the execution of the current event
handler. In almost all cases, a call to ``defer()`` should be followed
by an explicit ``return`` from the handler;
* the re-execution of the deferred event handler starts from the top of
the handler method (not where defer was called);
* only the handlers that actually called ``defer()`` are called again
(that is: despite talking about “deferring an event” it is actually
the handler/event combination that is deferred); and
* any deferred events get processed before the event (or action) that
caused the current invocation of the charm.
The general desire to call ``defer()`` happens when some precondition
isn't yet met. However, care should be exercised as to whether it is
better to defer this event so that you see it again, or whether it is
better to just wait for the event that indicates the precondition has
been met.
For example, if ``config-changed`` is fired, and you are waiting for
different config, there is no reason to defer the event because there
will be a *different* ``config-changed`` event when the config actually
changes, rather than checking to see if maybe config has changed prior
to every other event that occurs.
Similarly, if you need 2 events to occur before you are ready to
proceed (say event A and B). When you see event A, you could chose to
``defer()`` it because you haven't seen B yet. However, that leads to:
1. event A fires, calls defer()
2. event B fires, event A handler is called first, still hasn't seen B
happen, so is deferred again. Then B happens, which progresses since
it has seen A.
3. At some future time, event C happens, which also checks if A can
proceed.
"""
It has been rewritten a couple of times since this issue was opened. Do you think it's clear enough as it stands?
I think the updated docstring that Pietro copied above is thorough enough that we can mark this fixed.