kubernetes-replicator kubernetes-replicator pod crashes when updating secrets

Describe the bug kubernetes-replicator pod crashes when replicating single secret across 150+ namespaces

To Reproduce Installed using helm

          helm repo add mittwald https://helm.mittwald.de
          helm upgrade --version v2.6.3 --install kubernetes-replicator mittwald/kubernetes-replicator --namespace kubernetes-replicator

Expected behavior pod should not crash so that all secrets are replicated to all namespaces

Environment:

Kubernetes version: [1.22]
kubernetes-replicator version: [2.6.3]

Additional context Additional details about pod termination

Increased pod resource quotas but still pod crashed
Increased replica count to 3 but all 3 bods crashed
Updated to latest version replicator 2.7.3 still pod crashed

      terminated:
        exitCode: 2
        finishedAt: "2022-09-13T21:23:52Z"
        reason: Error
        startedAt: "2022-09-13T21:14:14Z"
    name: kubernetes-replicator
    ready: false
    restartCount: 1

This is the message after the pod restart

kubernetes-replicator-768465d6d7-4mx78:kubernetes-replicator time="2022-09-13T21:24:09Z" level=error msg="could not replicate object to other namespaces" error="Replicated kubernetes-replicator/xyz.com.registry.creds to 70 out of 155 namespaces

Sep 13 '22 21:09 sravanakinapally

There has not been any activity to this issue in the last 14 days. It will automatically be closed after 7 more days. Remove the stale label to prevent this.

Sep 28 '22 02:09 mittwald-machine

Any update is appreciated

Sep 28 '22 12:09 sravanakinapally

Apologies for the delay. Do you have any logs available from when the controller crashed? Those would help isolate the issue, and also to determine if this is the same issue as #214.

Oct 07 '22 06:10 martin-helmich

These are the logs from pod

level=error msg="could not replicate object to other namespaces" error="Replicated kubernetes-replicator/xxxxxxxx.xxxxx.creds to 157 out of 178 namespaces: 21 errors occurred:\n\t*

and pod crashes with this error

    lastState:
      terminated:
        containerID: containerd://63cbde6dd2d3a1659ce116e83c9545312d1482024c1548704aab755f3dac6313
        exitCode: 2
        finishedAt: "2022-10-13T20:20:54Z"
        reason: Error
        startedAt: "2022-10-13T20:14:55Z"

Oct 13 '22 20:10 sravanakinapally

After further debug container in the pod is killed due to liveness probe failure , made changes to periodSeconds from 10 to 60 on both liveness and readiness probes noticed container is not killed and no replication failures. That said container is being killed only when replicator start replicating the secrets and probes are failing, so something is blocking requests from probes when secrets are replicating.

Oct 14 '22 11:10 sravanakinapally

There has not been any activity to this issue in the last 14 days. It will automatically be closed after 7 more days. Remove the stale label to prevent this.

Oct 29 '22 02:10 mittwald-machine

I am experiencing this same issue.

Nov 29 '22 22:11 cherrera-acx

https://github.com/mittwald/kubernetes-replicator/blob/f3d5125b0065c0d2ee813437df0967f3d6aca535/liveness/handle.go#L26

Its from right here. If the status is not "synced" for all resources, then it will report unhealthy. This shouldn't be the case for a liveness probe. a sync that is not yet completed doesn't mean that it application should report unhealthy

Dec 07 '22 13:12 slimm609

We're still experiencing this issue, is there an update? I noticed PR check failed on spinning up a KIND cluster, can we rerun?

Mar 01 '23 23:03 cherrera-acx

kubernetes-replicator kubernetes-replicator copied to clipboard

kubernetes-replicator pod crashes when updating secrets

kubernetes-replicator
kubernetes-replicator copied to clipboard