nginx-prometheus-exporter icon indicating copy to clipboard operation
nginx-prometheus-exporter copied to clipboard

Add "sleep" feature to the Docker image as a separate binary or as an argument

Open KIVagant opened this issue 3 months ago • 2 comments

Is your feature request related to a problem? Please describe.

I'm trying to start using the exporter close to how it is described in this article. I launch the Exporter as a sidecar container to Nginx in a Kubernetes Pod.

But I have a problem. The Nginx in my setup is also a sidecar to the backend container. And I use the preStop container lifecycle hook. It's a simple "exec" command that runs "sleep". This allows to mitigate some 5xx errors for the end-users.

I tried to configure a similar preStop hook for the Exporter. Unfortunately, there's no binary in the Docker image that I can call to run the sleep X command. I wanted to use something like this:

lifecycle:
  preStop:
    exec:
      command:
        - sleep
        - 10

This leads to a problem that Kubernetes may kill the Exporter container before the Nginx container. And a portion of very important metrics will never be exported to the monitoring system. A lot of corner cases appear when a Kubernetes Pod goes down and a new one starts as a replacement. Accurate monitoring is crucial to debug such cases.

The Kubernetes developers introduced a Feature Gate called PodLifecycleSleepAction, which is described here and its goal is basically to replicate that sleep command. The problem is that the Feature Gate current status is alpha and it's available since Kubernetes 1.29. Cloud platforms, such as AWS, don't allow Alpha Gates in their Kubernetes implementations. It can take a year or more for this feature to land into the EKS world.

Describe the solution you'd like

It would be very nice if the "sleep" feature was included into the Docker image of nginx-prometheus-exporter either as a separate command, or as a part of the binary itself, i.e. an argument nginx-prometheus-exporter --sleep 10 that would simply return the 0 exit code after the sleep.

Describe alternatives you've considered

As mentioned above, the PodLifecycleSleepAction is the best alternative. Although, it will take too long for the feature to become available in some production environments with long update lifecycles. In my case we aren't even close to Kubernetes 1.29.

Additional context

KIVagant avatar Mar 08 '24 03:03 KIVagant