kwatch icon indicating copy to clipboard operation
kwatch copied to clipboard

Allow ignoring errors based on a certain event messages?

Open mfn opened this issue 2 years ago • 0 comments

Hi,

one of the biggest annoyances I currently experience is that, running a lot of k8s CronJob resources, multiple times a day I get this

There is an issue with container in a pod! Name: name of pod Container: container name Reason: CreateContainerConfigError

However, the actual reason for this is:

[2022-08-02 00:00:00 +0000 UTC] Scheduled Successfully assigned namespace/pod to ip-123-456-…
[2022-08-02 00:00:04 +0000 UTC] Pulled Container image "<container>" already present on machine
[2022-08-02 00:00:02 +0000 UTC] Failed Error: failed to sync configmap cache: timed out waiting for the condition
[2022-08-02 00:00:07 +0000 UTC] Created Created container container

Failed Error: failed to sync configmap cache: timed out waiting for the condition

In all those cases, the container is started (after, what seems, a 5s timeout). In none of the observed cases the container was not started.

I truly wish it would be possible to ignore them, as you can imagine they generate quite a lot of noise.


As for the reason, I'm no expert, but it seems that at certain points too many jobs are being scheduled at the same time and k8s internally throttles the requests and then this happens. I'm fine with the internal throttling and rather not want to manually pick apart the jobs schedule and move them to different minutes (it's easier to comprehend when they hit at full hours, etc.)

Is this something which can be considered?

mfn avatar Aug 02 '22 11:08 mfn