WMCore
WMCore copied to clipboard
Ensure WM docker images - and databases - properly handle signals
Impact of the new feature WMCore (can easily cover all of the DMWM though)
Is your feature request related to a problem? Please describe. As we commission and plan to move our services to container solution, and triggered by a question of "docker stop" vs "docker kill", I came across a post explaining the importance of propagating and handling process signals, see: https://vsupalov.com/docker-compose-stop-slow/#:~:text=Docker%20has%20limited%20patience%20when,for%2010%20seconds%20at%20least.
Describe the solution you'd like As described in the link above, we need to verify 3 things:
- In the Dockerfile: we need to use the "exec form" for the ENTRYPOINT
- In the ENTRYPOINT script: execute the application/process with the
exec
command. - In the ENTRYPOINT script: we might have to trap signals
to ensure that our applications - and the database images - are able to gracefully stop the processes.
It might be that it is all already implemented, but we need to cross-check these aspects and make changes whenever necessary.
Describe alternatives you've considered
In addition, note that stopping the WMAgent container might take more than 10 seconds (given that it has to gracefully stop almost 20 components), so we need should probably consider something like docker stop wmagent --time 30
, to give more it more time before sending in a SIGKILL.
Additional context None
@belforte @novicecpp @mapellidario this might be interesting/useful to you as well.
hi @amaltaro
In the ENTRYPOINT script: we might have to trap signals
At the current stage it is fairly easy for us to trap those signals ... our entry point is well defined and ease to add such a signal handling mechanism. But the question is ... What we gone do with it after catching the signal. Because it is going to be like a python exception handling, there will be no effect of it if we do not take the proper actions. And even worse, it will catch the signal and if we fail to properly exit the main process, we will be struggling to kill those containers in real life.
The only thing which comes to mind could be worth executing before killing the container is actually : manage stop-agent
.
But after that you cannot kill the entry point process
so easily. Here is solid example:
- Our entry point is
run.sh
- If we trap a
SIGKILL
signal and upon that we execute:manage agent-stop
. Then it is our obligation to killrun.sh
- Here is how the process order looks like inside our containers:
(WMAgent-2.3.3) [cmst1@vocms0260:current]$ ps auxf
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
cmst1 57541 0.0 0.0 6168 2488 pts/1 Ss 15:14 0:00 bash
cmst1 57661 0.0 0.0 8632 1656 pts/1 R+ 15:16 0:00 \_ ps auxf
cmst1 2866 0.0 0.7 299432 55396 ? Sl May07 0:06 python /usr/local/bin/wmcoreD --start --config=/data/srv/wmagent/2.3.3/config/config.py
cmst1 2857 0.0 0.7 299808 56116 ? Sl May07 0:12 python /usr/local/bin/wmcoreD --start --config=/data/srv/wmagent/2.3.3/config/config.py
...
cmst1 596 0.0 0.0 6292 2576 pts/0 Ss+ May07 0:00 bash
cmst1 1 0.0 0.0 6044 2044 ? Ss May07 0:00 /bin/bash ./run.sh 2>&1
cmst1 595 0.0 0.0 4268 624 ? S May07 0:00 sleep infinity
- Unfortunately upon executing the proper agent shutdown sequence and trying to kill PID 1(which should be the entry point) accomplishes no results:
(WMAgent-2.3.3) [cmst1@vocms0260:current]$ kill -9 1
(WMAgent-2.3.3) [cmst1@vocms0260:current]$ ps auxf
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
cmst1 57541 0.0 0.0 6168 2488 pts/1 Ss 15:14 0:00 bash
cmst1 57661 0.0 0.0 8632 1656 pts/1 R+ 15:16 0:00 \_ ps auxf
cmst1 2866 0.0 0.7 299432 55396 ? Sl May07 0:06 python /usr/local/bin/wmcoreD --start --config=/data/srv/wmagent/2.3.3/config/config.py
cmst1 2857 0.0 0.7 299808 56116 ? Sl May07 0:12 python /usr/local/bin/wmcoreD --start --config=/data/srv/wmagent/2.3.3/config/config.py
...
NOTE: we may need to investigate deeper on that
-
The only explanation I may have is the running
sleep infinity
process at the end ofrun.sh
and this one is a fork, which has a completely different PID, in this case it is595
-
So the only way to kill the entry point is by killing this exact process... Ok it is doable (manually), just like this:
(WMAgent-2.3.3) [cmst1@vocms0260:current]$ kill 595
(WMAgent-2.3.3) [cmst1@vocms0260:current]$
[cmst1@vocms0260 WMCoreDev.d]$
- Indeed the container immediately dies. But this means in order for us to properly kill that process we will have to know this exact PID at the time we are setting the trap (the signal handling mechansim). and since it is the last one to run inside the
run.sh
it would be quite challenging... to get that number.
There are tricks we can use of course ... not a single one even .... many... But we need to pay attention to those details and justify... is it worth going into those complications.
I can't keep up to dare personally with everything here. Too much for my old age. But IIRC @novicecpp has already deal with this in our new post-rpm containers. I would be really delighted if you could find a common ground.
In my view the proper solution is to get rid of sleep which was introduced because we redirect logs and in fact run python command and let it produce stdout/stderr. Doing this way you ability to kill run.sh which will not be blocked and kill its python child, the logs will be preserved in k8s/docker and no messing around with forking. This is a common way all other containers run, they run whatever the command/executable is and let it run with stdout/stderr such that it can be both inspected and killed. What we have is a hack to preserve RPM based procedure within container, i.e. redirect logs, log into container and re-start service, etc.
What we have is a hack to preserve RPM based procedure within container, i.e. redirect logs, log into container and re-start service, etc.
Very unfair comment! As you probably know, the agent actually runs +15 components and making all of them spit logs out to stdout is the same as having no logs at all (too large volume making it useless).
Alan, there is nothing unfair in a comment per-se. And, it is true that we transition from a previous based deployment procedure. It also true that all our service originally was not designed to run in container, and had different deployment model, and this model was based on host VM and RPMs. If we'll be able to do it from scratch and we need to run each individual component in its each container or side car, separate as much as possible in order to gain the control of runtime and logs in new environment.
IF we ever get to the point that each container will be running in its own container, there will be anyways a lot of refactoring. That would be the ideal moment to consider it. BTW, note that there is no kubernetes at all involved in this discussion neither we plan on moving to kubernetes.
Anyhow, @belforte if you can share any potential solutions that you/CRAB team has already found, please do share with us.
CRAB story in a nutshell: We currently run 4 services as containers: TaskWorker, PublisherForOldASOwithFTS, PublisherForASOwithRucio, Monitor and use same container image with different startups scripts, just for convenience.
PID 1 is something like /bin/sh -c sh /data/run.sh
and an env. var. is passed at docker run
time to tell it which service to run.
That's easy and nice. We have not put anything to stop things gracefully when docker stop
time. Currently if we want to stop a service for some reason, we log into the container and execute stop.sh
for a graceful termination. But one could certainly issue docker exec -ti TaskWorker /data/srv/TaskManager/stop.sh
. That's e.g. how we deploy a new version via Jenkins (Wa is changing that to a gitlab pipeline but I think it is same concept).
I am no sure what problem you want to solve. I do not think that you need to trap signals from outside the container. And since you are in control, you do not need to issue docker kill
or 'docker stop` if you do not want to.
Or do you want to use some container management tool ? With one container for agent it seems a bit overkill.
Of course if VM reboots or its hypervisor goes somehow bellyup, there wlll be no graceful stop, but it is extremely rare and hopefully application is not too much stateful.
K8s and "Lassi's application manager" are a different story, we are using what Valentin set up few years ago when he moved CMSWEB to K8s. I know that Wa has changed things a bit for the rpm-free version, but if I try to summarize I will make mistakes.
@belforte, the new PyPI image still not dealing with termination properly. Sorry, I just discovered this when Alan tag us on this issue.
I use tini as PID 1 to execute /bin/sh -c sh /data/run.sh
where it runs sleep forever. tini
does not propagate signal to main python processes, but only helps container shutdown faster by tini
getting terminated (bash
as PID 1 does not) and OS reap other processes inside the container with SIGKILL. We still need trap signal in /data/run.sh
to terminate python process gracefully.
The start, stop, and running of the latest version from git are so convenient for us to test/debug inside the container on-the-fly. So, the double fork and sleep forever loop are still needed for us unless we discover a better way to test/debug and whether it is worth changing this behavior.
@novicecpp but we can still do docker exec -ti TaskWorker /data/srv/TaskManager/stop.sh
right ? I have not understood yet which exact problem Alan (or us) needs to solve. Maybe we can talk inside CRAB first.
Thanks for the hint Stefano, indeed that is something we can do for WMAgent
as well. I'll give it a try in parallel to an idea or two that also come to mind.
@khurtado if you don't have alternative to propose, I can start testing @todor-ivanov 's suggestion
@anpicci Yes please, sounds good to me.
@anpicci I see you have many other tests in your pipeline. As this might require modifications and further debugging, I would highly recommend to leave this one up to @khurtado (and we/you help him on demand).
@amaltaro stepping back for now, and possibly helping if necessary, is fine with me. How does that sound @khurtado?
@anpicci Sounds good to me.
I created another version based on Todor's solution.
https://github.com/dmwm/CMSKubernetes/pull/1488
Whenever you can, could you test it?
Still pending final validation of https://github.com/dmwm/CMSKubernetes/pull/1491, reopening.