kopf icon indicating copy to clipboard operation
kopf copied to clipboard

Retry behaviour of multiple handlers for the same object

Open psontag opened this issue 3 years ago • 1 comments

Keywords

retry, error, multiple handlers

Problem

Long Story Short

We have two different handlers defined for the same object. When one of them fails, but the other succeeds, the one that failed will not be retried.

Details

We have the handlers below defined to watch for changes in the status.readyReplicas and status.unavailableReplicas fields of a Deployment. Based on these changes we then want to update another custom resource in the cluster.

@kopf.on.field(
    "apps/v1",
    "Deployment",
    field="status.readyReplicas",
    param="ready-replicas",
)
@kopf.on.field(
    "apps/v1",
    "Deployment",
    field="status.unavailableReplicas",
    param="unavailable-replicas",
)

def deployments(
    name: str,
    namespace: str,
    started: datetime.datetime,
    meta: kopf.Meta,
    status: kopf.Status,
    labels: Dict[str, str],
    param: str,
    logger: kopf.ObjectLogger,
    retry: int,
    runtime,
    **_: Any,
) -> None:
    if not retry and param == "unavailable-replicas":
       # Reproduce the error
        raise ReadTimeoutError(
             pool=None, url="my-url", message=f"I made this via {param}"
        )
    else:
        # patch another custom resource in the cluster
        ...

Usually, the two handlers get executed shortly after one another. Sometimes we observe timeout errors for the patch operations for our custom resource in one of the handlers. Based on the kopf error handling docs we expect this not to be a problem because the handler should be automatically be retried after 60s (the default delay). When this happens we can also see the that the annotations were updated with the error status for that specific handler:

"prefix/deployments.status.unavailableReplicas": '{
    "started": "2021-08-06T13:29:32.839372",
    "delayed": "2021-08-06T13:30:32.8 43198",
    "purpose": "create",
    "retries": 1,
    "success": false,
    "failure": false,
    "message": "I made this via unavailable-replicas",
}'

The problem now occurs when the other handler runs through successfully. This seems to remove the error annotation from the first handler, so that it is no longer retried.

Question: Is it expected that that the error annotations of one handler are removed, when another handler for that object runs through successfully?

psontag avatar Aug 06 '21 14:08 psontag

Hey @nolar can you confirm that this is the expected behaviour? If the problem is unclear or you need more information please let me know.

psontag avatar Aug 16 '21 08:08 psontag