WMCore icon indicating copy to clipboard operation
WMCore copied to clipboard

Ensure WM docker images - and databases - properly handle signals

Open amaltaro opened this issue 9 months ago • 15 comments

Impact of the new feature WMCore (can easily cover all of the DMWM though)

Is your feature request related to a problem? Please describe. As we commission and plan to move our services to container solution, and triggered by a question of "docker stop" vs "docker kill", I came across a post explaining the importance of propagating and handling process signals, see: https://vsupalov.com/docker-compose-stop-slow/#:~:text=Docker%20has%20limited%20patience%20when,for%2010%20seconds%20at%20least.

Describe the solution you'd like As described in the link above, we need to verify 3 things:

  1. In the Dockerfile: we need to use the "exec form" for the ENTRYPOINT
  2. In the ENTRYPOINT script: execute the application/process with the exec command.
  3. In the ENTRYPOINT script: we might have to trap signals

to ensure that our applications - and the database images - are able to gracefully stop the processes.

It might be that it is all already implemented, but we need to cross-check these aspects and make changes whenever necessary.

Describe alternatives you've considered In addition, note that stopping the WMAgent container might take more than 10 seconds (given that it has to gracefully stop almost 20 components), so we need should probably consider something like docker stop wmagent --time 30, to give more it more time before sending in a SIGKILL.

Additional context None

amaltaro avatar May 08 '24 13:05 amaltaro

@belforte @novicecpp @mapellidario this might be interesting/useful to you as well.

amaltaro avatar May 08 '24 13:05 amaltaro

hi @amaltaro

In the ENTRYPOINT script: we might have to trap signals

At the current stage it is fairly easy for us to trap those signals ... our entry point is well defined and ease to add such a signal handling mechanism. But the question is ... What we gone do with it after catching the signal. Because it is going to be like a python exception handling, there will be no effect of it if we do not take the proper actions. And even worse, it will catch the signal and if we fail to properly exit the main process, we will be struggling to kill those containers in real life.

The only thing which comes to mind could be worth executing before killing the container is actually : manage stop-agent.

But after that you cannot kill the entry point process so easily. Here is solid example:

  • Our entry point is run.sh
  • If we trap a SIGKILL signal and upon that we execute: manage agent-stop . Then it is our obligation to kill run.sh
  • Here is how the process order looks like inside our containers:
(WMAgent-2.3.3) [cmst1@vocms0260:current]$ ps auxf  
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
cmst1      57541  0.0  0.0   6168  2488 pts/1    Ss   15:14   0:00 bash
cmst1      57661  0.0  0.0   8632  1656 pts/1    R+   15:16   0:00  \_ ps auxf
cmst1       2866  0.0  0.7 299432 55396 ?        Sl   May07   0:06 python /usr/local/bin/wmcoreD --start --config=/data/srv/wmagent/2.3.3/config/config.py
cmst1       2857  0.0  0.7 299808 56116 ?        Sl   May07   0:12 python /usr/local/bin/wmcoreD --start --config=/data/srv/wmagent/2.3.3/config/config.py
...
cmst1        596  0.0  0.0   6292  2576 pts/0    Ss+  May07   0:00 bash
cmst1          1  0.0  0.0   6044  2044 ?        Ss   May07   0:00 /bin/bash ./run.sh 2>&1
cmst1        595  0.0  0.0   4268   624 ?        S    May07   0:00 sleep infinity

  • Unfortunately upon executing the proper agent shutdown sequence and trying to kill PID 1(which should be the entry point) accomplishes no results:
(WMAgent-2.3.3) [cmst1@vocms0260:current]$ kill -9 1 
(WMAgent-2.3.3) [cmst1@vocms0260:current]$ ps auxf  
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
cmst1      57541  0.0  0.0   6168  2488 pts/1    Ss   15:14   0:00 bash
cmst1      57661  0.0  0.0   8632  1656 pts/1    R+   15:16   0:00  \_ ps auxf
cmst1       2866  0.0  0.7 299432 55396 ?        Sl   May07   0:06 python /usr/local/bin/wmcoreD --start --config=/data/srv/wmagent/2.3.3/config/config.py
cmst1       2857  0.0  0.7 299808 56116 ?        Sl   May07   0:12 python /usr/local/bin/wmcoreD --start --config=/data/srv/wmagent/2.3.3/config/config.py
...

NOTE: we may need to investigate deeper on that

  • The only explanation I may have is the running sleep infinity process at the end of run.sh and this one is a fork, which has a completely different PID, in this case it is 595

  • So the only way to kill the entry point is by killing this exact process... Ok it is doable (manually), just like this:

(WMAgent-2.3.3) [cmst1@vocms0260:current]$ kill 595
(WMAgent-2.3.3) [cmst1@vocms0260:current]$ 
[cmst1@vocms0260 WMCoreDev.d]$ 
  • Indeed the container immediately dies. But this means in order for us to properly kill that process we will have to know this exact PID at the time we are setting the trap (the signal handling mechansim). and since it is the last one to run inside the run.sh it would be quite challenging... to get that number.

There are tricks we can use of course ... not a single one even .... many... But we need to pay attention to those details and justify... is it worth going into those complications.

todor-ivanov avatar May 08 '24 15:05 todor-ivanov

I can't keep up to dare personally with everything here. Too much for my old age. But IIRC @novicecpp has already deal with this in our new post-rpm containers. I would be really delighted if you could find a common ground.

belforte avatar May 08 '24 15:05 belforte

In my view the proper solution is to get rid of sleep which was introduced because we redirect logs and in fact run python command and let it produce stdout/stderr. Doing this way you ability to kill run.sh which will not be blocked and kill its python child, the logs will be preserved in k8s/docker and no messing around with forking. This is a common way all other containers run, they run whatever the command/executable is and let it run with stdout/stderr such that it can be both inspected and killed. What we have is a hack to preserve RPM based procedure within container, i.e. redirect logs, log into container and re-start service, etc.

vkuznet avatar May 08 '24 16:05 vkuznet

What we have is a hack to preserve RPM based procedure within container, i.e. redirect logs, log into container and re-start service, etc.

Very unfair comment! As you probably know, the agent actually runs +15 components and making all of them spit logs out to stdout is the same as having no logs at all (too large volume making it useless).

amaltaro avatar May 08 '24 17:05 amaltaro

Alan, there is nothing unfair in a comment per-se. And, it is true that we transition from a previous based deployment procedure. It also true that all our service originally was not designed to run in container, and had different deployment model, and this model was based on host VM and RPMs. If we'll be able to do it from scratch and we need to run each individual component in its each container or side car, separate as much as possible in order to gain the control of runtime and logs in new environment.

vkuznet avatar May 08 '24 17:05 vkuznet

IF we ever get to the point that each container will be running in its own container, there will be anyways a lot of refactoring. That would be the ideal moment to consider it. BTW, note that there is no kubernetes at all involved in this discussion neither we plan on moving to kubernetes.

Anyhow, @belforte if you can share any potential solutions that you/CRAB team has already found, please do share with us.

amaltaro avatar May 08 '24 17:05 amaltaro

CRAB story in a nutshell: We currently run 4 services as containers: TaskWorker, PublisherForOldASOwithFTS, PublisherForASOwithRucio, Monitor and use same container image with different startups scripts, just for convenience.

PID 1 is something like /bin/sh -c sh /data/run.sh and an env. var. is passed at docker run time to tell it which service to run.

That's easy and nice. We have not put anything to stop things gracefully when docker stop time. Currently if we want to stop a service for some reason, we log into the container and execute stop.sh for a graceful termination. But one could certainly issue docker exec -ti TaskWorker /data/srv/TaskManager/stop.sh. That's e.g. how we deploy a new version via Jenkins (Wa is changing that to a gitlab pipeline but I think it is same concept).

I am no sure what problem you want to solve. I do not think that you need to trap signals from outside the container. And since you are in control, you do not need to issue docker kill or 'docker stop` if you do not want to. Or do you want to use some container management tool ? With one container for agent it seems a bit overkill. Of course if VM reboots or its hypervisor goes somehow bellyup, there wlll be no graceful stop, but it is extremely rare and hopefully application is not too much stateful.

K8s and "Lassi's application manager" are a different story, we are using what Valentin set up few years ago when he moved CMSWEB to K8s. I know that Wa has changed things a bit for the rpm-free version, but if I try to summarize I will make mistakes.

belforte avatar May 08 '24 19:05 belforte

@belforte, the new PyPI image still not dealing with termination properly. Sorry, I just discovered this when Alan tag us on this issue.

I use tini as PID 1 to execute /bin/sh -c sh /data/run.sh where it runs sleep forever. tini does not propagate signal to main python processes, but only helps container shutdown faster by tini getting terminated (bash as PID 1 does not) and OS reap other processes inside the container with SIGKILL. We still need trap signal in /data/run.sh to terminate python process gracefully.

The start, stop, and running of the latest version from git are so convenient for us to test/debug inside the container on-the-fly. So, the double fork and sleep forever loop are still needed for us unless we discover a better way to test/debug and whether it is worth changing this behavior.

novicecpp avatar May 13 '24 09:05 novicecpp

@novicecpp but we can still do docker exec -ti TaskWorker /data/srv/TaskManager/stop.sh right ? I have not understood yet which exact problem Alan (or us) needs to solve. Maybe we can talk inside CRAB first.

belforte avatar May 13 '24 09:05 belforte

Thanks for the hint Stefano, indeed that is something we can do for WMAgent as well. I'll give it a try in parallel to an idea or two that also come to mind.

todor-ivanov avatar May 13 '24 11:05 todor-ivanov

@khurtado if you don't have alternative to propose, I can start testing @todor-ivanov 's suggestion

anpicci avatar May 21 '24 13:05 anpicci

@anpicci Yes please, sounds good to me.

khurtado avatar May 21 '24 14:05 khurtado

@anpicci I see you have many other tests in your pipeline. As this might require modifications and further debugging, I would highly recommend to leave this one up to @khurtado (and we/you help him on demand).

amaltaro avatar May 21 '24 14:05 amaltaro

@amaltaro stepping back for now, and possibly helping if necessary, is fine with me. How does that sound @khurtado?

anpicci avatar May 21 '24 14:05 anpicci

@anpicci Sounds good to me.

I created another version based on Todor's solution.

https://github.com/dmwm/CMSKubernetes/pull/1488

Whenever you can, could you test it?

khurtado avatar May 23 '24 14:05 khurtado

Still pending final validation of https://github.com/dmwm/CMSKubernetes/pull/1491, reopening.

amaltaro avatar May 24 '24 19:05 amaltaro