s6-overlay
s6-overlay copied to clipboard
V3 finish script not executed
I am trying to setup S6 to check preconditions (e.g. is a port/socket connectable) and if they are not met the container should fail/stop. To do so I would like to define a service, ideally a one shot which retries n times. What I tried so far was setting up a long running service with a finish script using s6-permafailon
. Somehow that is never triggered. So either this is a bug or it is not quite clear how to setup finish scripts. To reproduce the issue see the Dockerfile:
FROM alpine
# configs
# ==> s6-rc.d/myapp/finish <==
# #!/bin/execlineb -P
# #did it fail 5 times in the last 2 seconds with an exit code betwee 1 and 255
# s6-permafailon 2 5 1-100
# ==> s6-rc.d/myapp/run <==
# #!/command/execlineb -P
# #check if can connect to rabbit mq
# socat -u /dev/stdin tcp-connect:localhost:5672
# ==> s6-rc.d/myapp/type <==
# longrun
ARG S6_OVERLAY_VERSION=3.1.0.1
RUN apk update && apk add xz socat
ADD https://github.com/just-containers/s6-overlay/releases/download/v${S6_OVERLAY_VERSION}/s6-overlay-noarch.tar.xz /tmp
RUN tar -C / -Jxpf /tmp/s6-overlay-noarch.tar.xz
ADD https://github.com/just-containers/s6-overlay/releases/download/v${S6_OVERLAY_VERSION}/s6-overlay-x86_64.tar.xz /tmp
RUN tar -C / -Jxpf /tmp/s6-overlay-x86_64.tar.xz
#create service
RUN mkdir -p /etc/s6-overlay/s6-rc.d/myapp/
WORKDIR /etc/s6-overlay/s6-rc.d/myapp/
RUN echo -e '#!/bin/execlineb -P\n#did it fail 5 times in the last 2 seconds with an exit code betwee 1 and 255\ns6-permafailon 2 5 1-100' > finish
RUN echo -e '#!/command/execlineb -P\n#check if can connect to rabbit mq\nsocat -u /dev/stdin tcp-connect:localhost:5672\n' > run
RUN echo -e 'longrun' > type
#add to bundle
RUN mkdir -p /etc/s6-overlay/s6-rc.d/user/contents.d
WORKDIR /etc/s6-overlay/s6-rc.d/user/contents.d
RUN touch myapp
WORKDIR /
ENTRYPOINT ["/init"]
- I'm on vacation until the end of the month, so I don't have the infrastructure to reproduce at the moment - and please don't expect fast answers.
- What exactly is never triggered? Is the service started, can you see it in the s6-rc logs?
- Please note that version 3.1.2.1 is out and fixes some bugs.
- s6-overlay installs s6-networking, so you don't have to install socat for your test.
s6-tcpclient -4 localhost 5672 true
should perform the exact same test. - Note that for the container to stop automatically, the CMD should fail. A failing supervised service, even in permanent failure mode, will not trigger a container shutdown. If you want a container shutdown, you need to
- either have your CMD exit
- or, if you have no CMD, write the container exit code you want to
/run/s6-linux-init-container-results/exitcode
then callhalt
.
Thank you for your prompt answer! Now the findings in the order of your comments:
- No need to hurry
- The service was started but the finish script was never executed.
- I do use now 3.1.2.1
- Thank you for the hint, but I will also have to check unix sockets whether they are connectable (e.g. from rsyslogd) so thats why I am generally using socat
- Thanks for the hint, I managed to halt and update the exit code of the container.
Now the only thing left over is that, even though that the service command fails, it reports that the service successfully started. I was thinking that if a service depends on another one and that the dependency is not met, that it wont start. I think somehow I am missing something. Here is the most recent config:
==> s6-rc.d/user/contents.d/myapp <==
==> s6-rc.d/user/contents.d/secondary <==
==> s6-rc.d/myapp/type <==
longrun
==> s6-rc.d/myapp/run <==
#!/command/execlineb -P
#check if can connect to rabbit mq
socat -u /dev/null tcp-connect:localhost:5672
==> s6-rc.d/myapp/finish <==
#!/command/execlineb -S0
#read the failure count
backtick -D0 -E failcnt {
pipeline { s6-svdt /run/service/myapp/ }
awk "BEGIN{cnt=0}{if($3!=0)cnt++}END{print cnt}"
}
# s6-echo "myapp fail count $failcnt"
if -X { test $failcnt -gt 5 }
foreground { redirfd -w 1 /run/s6-linux-init-container-results/exitcode echo 0 }
/run/s6/basedir/bin/halt
==> s6-rc.d/secondary/type <==
oneshot
==> s6-rc.d/secondary/dependencies.d/myapp <==
==> s6-rc.d/secondary/up <==
And here the console output:
s6-rc: info: service myapp: starting
s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service myapp successfully started
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service legacy-services: starting
2022/09/16 10:05:01 socat[32] E connect(6, AF=2 127.0.0.1:5672, 16): Connection refused
s6-rc: info: service legacy-services successfully started
/ # 2022/09/16 10:05:02 socat[66] E connect(6, AF=2 127.0.0.1:5672, 16): Connection refused
2022/09/16 10:05:03 socat[71] E connect(6, AF=2 127.0.0.1:5672, 16): Connection refused
2022/09/16 10:05:04 socat[76] E connect(6, AF=2 127.0.0.1:5672, 16): Connection refused
2022/09/16 10:05:05 socat[81] E connect(6, AF=2 127.0.0.1:5672, 16): Connection refused
2022/09/16 10:05:06 socat[86] E connect(6, AF=2 127.0.0.1:5672, 16): Connection refused
s6-rc: info: service legacy-services: stopping
s6-rc: info: service legacy-services successfully stopped
s6-rc: info: service legacy-cont-init: stopping
s6-rc: info: service myapp: stopping
s6-rc: info: service myapp successfully stopped
s6-rc: info: service legacy-cont-init successfully stopped
s6-rc: info: service fix-attrs: stopping
s6-rc: info: service fix-attrs successfully stopped
s6-rc: info: service s6rc-oneshot-runner: stopping
s6-rc: info: service s6rc-oneshot-runner successfully stopped
I have same issue, the down
of oneshot service is not executed.
@scuzhanglei You'll have to provide more context if you want help. I doubt you're having the exact same issue, given what I've outlined above; please open another issue with details of your problem.
@okaerin Sorry for not getting back to you sooner. So the thing with longruns is, once they're started and have reached readiness at least once, s6-rc considers that starting the service has been successful; if the service dies later on, it's a temporary error, the supervisor is supposed to restart it. And when the service doesn't define readiness (there's no notification-fd
file in your service definition directory), it is considered ready as soon as it starts. So, in your case, myapp
is considered successfully started as soon as the run script is executed, even if it fails later.
To prevent that, you should make sure myapp
is only ready after socat
successfully establishes a connection. I don't know how to do that with socat
, so here is how I would do it:
==> s6-rc.d/myapp/notification-fd <==
3
==> s6-rc.d/myapp/run <==
#!/command/execlineb -P
redirfd -r 0 /dev/null
redirfd -w 1 /dev/null
s6-tcpclient -DRHl0 localhost 5672
if { fdmove 1 3 s6-echo }
fdclose 3
s6-ioconnect -67
The fdmove 1 3 s6-echo
line does it: once s6-tcpclient
has established a connection to localhost:5672, it writes a line to fd 3, signaling the supervisor that the service is ready. Then fd 3 is closed and s6-ioconnect
maintains a connection between /dev/null
and localhost:5672.
Note that all this is a pretty expensive way to check for RabbitMQ readiness. If RabbitMQ is started in this container, then you should write a readiness script for it instead. If it is started in another container, then you should probably have a policy that says this container will not start before RabbitMQ is ready, and have a readiness checker outside of this container.
thanks for your reply , below is more details. cloud-hypervisor
is a lognrun service, virt-prerunner
is a oneshot service depend on cloud-hypervisor
. I hope if any error happened in virt-prerunner
, the container exit. to to this, I add a down for virt-prerunner
to stop the container by /run/s6/basedir/bin/halt
, but as I see, the down is not executed, when virt-prerunner
stoped with an error or not, the container still keep running.
longrun service
/ # cat /etc/s6-overlay/s6-rc.d/cloud-hypervisor/type
longrun
/ # cat /etc/s6-overlay/s6-rc.d/cloud-hypervisor/run
#!/usr/bin/execlineb -P
cloud-hypervisor --api-socket /var/run/ch.sock
/ # cat /etc/s6-overlay/s6-rc.d/cloud-hypervisor/finish
#!/bin/sh
if test "$1" -eq 256 ; then
e=$((128 + $2))
else
e="$1"
fi
echo "$e" > /run/s6-linux-init-container-results/exitcode
/run/s6/basedir/bin/halt
oneshot service
/ # cat /etc/s6-overlay/s6-rc.d/virt-prerunner/type
oneshot
/ # cat /etc/s6-overlay/s6-rc.d/virt-prerunner/up
none-exists-command
/ # cat /etc/s6-overlay/s6-rc.d/virt-prerunner/down
/run/s6/basedir/bin/halt
/ # cat /etc/s6-overlay/s6-rc.d/virt-prerunner/dependencies.d/cloud-hypervisor
log
# s6-rc: info: service cloud-hypervisor: starting
# s6-rc: info: service s6rc-oneshot-runner: starting
# s6-rc: info: service cloud-hypervisor successfully started
# s6-rc: info: service s6rc-oneshot-runner successfully started
# s6-rc: info: service fix-attrs: starting
# s6-rc: info: service virt-prerunner: starting
# s6-rc-oneshot-run: fatal: unable to exec none-exists-command: No such file or directory
# s6-rc: warning: unable to start service virt-prerunner: command exited 127
# s6-rc: info: service fix-attrs successfully started
# s6-rc: info: service legacy-cont-init: starting
# s6-rc: info: service legacy-cont-init successfully started
When the up
script for virt-prerunner
fails, the service is not started, so it's normal that the down
script is never executed.
down
scripts are only executed when a service that has been started is being stopped.
As an aside, do not call /run/s6/basedir/bin/halt
in a down
script. When down
scripts are run, it means that the container is already in the process of stopping.
ENV S6_BEHAVIOUR_IF_STAGE2_FAILS=2
thanks, it works now.
@skarnet thanks, it works with the readyness notification