kwatch
kwatch copied to clipboard
Allow ignoring errors based on a certain event messages?
Hi,
one of the biggest annoyances I currently experience is that, running a lot of k8s CronJob resources, multiple times a day I get this
There is an issue with container in a pod! Name: name of pod Container: container name Reason: CreateContainerConfigError
However, the actual reason for this is:
[2022-08-02 00:00:00 +0000 UTC] Scheduled Successfully assigned namespace/pod to ip-123-456-…
[2022-08-02 00:00:04 +0000 UTC] Pulled Container image "<container>" already present on machine
[2022-08-02 00:00:02 +0000 UTC] Failed Error: failed to sync configmap cache: timed out waiting for the condition
[2022-08-02 00:00:07 +0000 UTC] Created Created container container
Failed Error: failed to sync configmap cache: timed out waiting for the condition
In all those cases, the container is started (after, what seems, a 5s timeout). In none of the observed cases the container was not started.
I truly wish it would be possible to ignore them, as you can imagine they generate quite a lot of noise.
As for the reason, I'm no expert, but it seems that at certain points too many jobs are being scheduled at the same time and k8s internally throttles the requests and then this happens. I'm fine with the internal throttling and rather not want to manually pick apart the jobs schedule and move them to different minutes (it's easier to comprehend when they hit at full hours, etc.)
Is this something which can be considered?