containers-roadmap icon indicating copy to clipboard operation
containers-roadmap copied to clipboard

Add support for `--stop-signal` `docker run` flag

Open sirocode opened this issue 8 years ago • 8 comments

Hi,

When running Centos 7 based container with systemd there's a graceful systemd service shutdown issue.

If I run container (not in ECS) with option "docker run --stop-signal=$(kill -l RTMIN+3) ..." it all works fine - systemd service gets the signal and service is being correctly shut down inside the container.

Would be really great to have this feature implemented in ECS!

p.s. Or maybe there's a wide known workaround for this graceful shutdown issue. Please share!

sirocode avatar May 08 '17 12:05 sirocode

Hi @sirocode, thanks for the question. It isn't currently possible to set --stop-signal in ECS, but I've flagged this as a feature request and we will continue to track it.

In the mean time, you can increase ECS_CONTAINER_STOP_TIMEOUT (default is 30 seconds) in your /etc/ecs/ecs.config which will increase the time after issuing a SIGTERM before issuing the subsequent SIGKILL. There is more information on that configuration option here.

jhaynes avatar May 09 '17 23:05 jhaynes

Thanks, @jhaynes! Seems that I've found a solution, it's a "STOPSIGNAL 37" Dockerfile directive.

This solution works fine on my local Centos 7 host (with all updates up to the current date) with docker 1.12.6 without using "--stop-signal" option for "docker run" but I still having issues while running the same docker image in ECS (using the latest ECS-optimized AMI which has also docker 1.12.6). Keep getting "Failed to get D-Bus connection" upon every "systemctl" command. My own systemd service is not running.

I've already compared "docker inspect" output from the container on my local machine and in ECS but can not find any major differences.

I've set the following options in ECS task definition:

  1. -v /sys/fs/cgroup:/sys/fs/cgroup:ro (it's set using built-in possibilities)
  2. --security-opt label:seccomp:unconfined (not sure about prefix "label:" but can not get it working without this prefix)
  3. ECS_SELINUX_CAPABLE=true in /etc/ecs/ecs.config

.. but still can not run my Centos7-systemd image in ECS.

Once I can not run my image with "STOPSIGNAL 37" directive in ECS - I can not say if ECS inherits this Dockerfile directive or not ;)

If someone has a working example with Centos7-systemd based image running in ECS with "STOPSIGNAL 37" enabled - please share your setup (Dockerfile and task definition setup).

sirocode avatar May 11 '17 06:05 sirocode

I'd like to add another use case for this:

I'm building apps with .net 5.0 and the graceful shutdown appears different if it gets a SIGINT vs a SIGTERM.

If it gets SIGINT, my logs get flushed as well as the core threads shutting down gracefully. With SIGTERM it shuts down running threads gracefully but my logging doesn't get flushed.

The SIGTERM behaviour is what I'm currently seeing with ECS, my immediate tasks complete but the logs don't get flushed.

It'd be handy to be able to configure a SIGINT instead, so I get the logs (so we can tell if it got a chance to complete the cleanup and handover routine or not).

GregHNZ avatar Jul 11 '21 20:07 GregHNZ

If ecs just inherited from the dockerfile STOPSIGNAL directive it would be good enough.

Right now it seems we can't have a graceful shutdown with nginx using ECS, That's bad since nginx is pretty popular.

This article mentions that in the future an option to use dockerfile STOPSIGNAL directive will be added to ecs, when is this future?

In my company we recently shifted our workload to ECS, and we really need nginx with graceful shutdown, we'd appreciate a consideration into this matter.

antoniodesenvolvedor avatar Jul 18 '23 14:07 antoniodesenvolvedor

@antoniodesenvolvedor we have the same issue with Nginx and I believe the two choices are upgrading to Nginx Plus to customize how Nginx responds to stop signals or create a bash wrapper that starts Nginx and rewrites the STOPSIGNAL from ECS to something that Nginx expects.

We haven't done either yet but I'm curious if you've done any work to solve this.

davidvasandani avatar Aug 30 '23 21:08 davidvasandani

@antoniodesenvolvedor we have the same issue with Nginx and I believe the two choices are upgrading to Nginx Plus to customize how Nginx responds to stop signals or create a bash wrapper that starts Nginx and rewrites the STOPSIGNAL from ECS to something that Nginx expects.

We haven't done either yet but I'm curious if you've done any work to solve this.

@davidvasandani we also haven't done either yet, I am curious to know how this bash wrapper would work. For now my team won't use nginx ,but if this bash wrapper works we might consider resuming the configuration of nginx with our workload.

antoniodesenvolvedor avatar Aug 31 '23 21:08 antoniodesenvolvedor

EKS reads the STOPSIGNAL from the image config, but FWIW k8s does not provide custom stop signal configuration override either:

https://github.com/kubernetes/kubernetes/issues/30051

joebowbeer avatar Aug 31 '23 22:08 joebowbeer

Celery has a similar use-case where on SIGTERM waits for all jobs to finish, which when trying to kill a container with a long-running job will cause that container to get killed with SIGKILL and its job to gets stuck in its visibility timeout (when using SQS as queue provider). However, when SIGQUIT is sent, it immediately stops all jobs and returns the tasks to queue - so they can be immediately picked up by another worker. So it makes most sense that with autoscaling in place, SIGQUIT should be used for celery instead of the default SIGTERM

pbudzon avatar Jul 01 '24 15:07 pbudzon