illumos-joyent
illumos-joyent copied to clipboard
Docker containers that use s6-overlay broke?
Now that docker v2 images are support, I started playing around with it. But looks like s6-overlay based images like plexinc/pms-docker don't seem to work.
Steps to reproduce:
imgadm import plexinc/pms-docker:plexpass
vmadm create -f plex_docker.json
{
"alias": "artemis-docker",
"hostname": "artemis-docker.example.org",
"image_uuid": "3e63d007-621c-3313-10a2-5b7eeb208abe",
"nics": [
{
"nic_tag": "trunk",
"primary": true,
"mtu": 1500,
"vlan_id": 10,
"ips": [ "10.xx.xx.98/24" ],
"gateways": [ "10.xx.xx.1" ]
}
],
"brand": "lx",
"docker": "true",
"kernel_version": "3.13.0",
"max_physical_memory": 2048,
"maintain_resolvers": true,
"resolvers": [
"10.xx.xx.1"
],
"quota": 15,
"internal_metadata": {
"docker:env": "[\"HOME=/config\", \"TZ=Europe/Brussels\"]",
"docker:entrypoint": "[\"/init\"]"
}
}
docker.log
[s6-init] making user provided files available at /var/run/s6/etc...
exited 0.
[s6-init] ensuring user provided files have correct perms...
exited 0.
[fix-attrs.d] applying ownership & permissions fixes...
[fix-attrs.d] done.
[cont-init.d] executing container initialization scripts...
[cont-init.d] 40-plex-first-run: executing...
Creating pref shell
Attempting to obtain server token from claim token
% Total %
Received % Xferd Average Speed Time
Time Time C
urrent
Dload Upload
T
otal Spent
Left Spe
ed
0
0 0
0 0
0 0 0 -
-:--:-- --:--:
-- --:--:--
0
100
1 100 1 0 0 2 0 --
:--:-- --:--:-- --:--:-- 2
Plex Media Server first run setup complete
[cont-init.d] 40-plex-first-run: exited 0.
[cont-init.d] 50-plex-update: executing...
Attempting to upgrade to: 1.12.0.4829-6de959918
% Total
% Received % Xferd Average S
peed Time Time
Time Current
Dload Upload Total Spen
t Left Speed
0 0 0 0
0 0 0
0 --:--:-- --:--:
-- --:--:-- 0
100 17
9 100 179 0 0 489
0 --:--:-- -
-:--:-- --:-
-:-- 490
100 179 100
179 0
0 489
0 --:--:-
- --:--:-- -
-:--:-- 4
89
11 103M 1
1 12.4M 0 0 9323k 0 0:00:11 0:00:
01 0:00:10 9323k
33 103M 3
3 35.2M 0 0 14.9M 0 0:00:06
0:00:02 0:00:04 22.8M
55 103M 5
5 57.7M 0 0 17.1M 0 0:00:06 0:00:03 0:0
0:03 22.6M
75 103M 75 78.1M 0 0
17.9M 0 0:00:05 0:00:04 0:00:01 21.9M
96 103M
96 100M 0 0 18.7M 0 0:00:05 0:00:05 -
-:--:-- 22.0M
100 103M 100
103M 0 0 18.8M 0 0:00:05 0:00:05
--:--:-- 22.1M
Selecting previously unselected package plexmediaserver.
(Reading database ... 7548 files and directories currently installed.)
Preparing to unpack /tmp/plexmediaserver.deb ...
Unpacking plexmediaserver (1.12.0.4829-6de959918) ...
Setting up plexmediaserver (1.12.0.4829-6de959918) ...
##################################################################
# NOTE: Your system does not have udev installed. Without udev #
# you won't be able to use DVBLogic's TVButler for DVR #
# or for LiveTV #
# #
# Please install udev and reinstall Plex Media Server to #
# to enable TV Butler support in Plex Media Server. #
# #
# To install udev run: sudo apt-get install udev #
# #
##################################################################
Processing triggers for systemd (229-4ubuntu21.1) ...
[cont-init.d] 50-plex-update: exited 0.
[cont-init.d] done.
[services.d] starting services
The services never get started, when poking around inside the zone I noticed this:
root@artemis-docker:~# ps -xfv
PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND
1 ? S 0:00 0 0 7200 2476 0.1 /bin/sh /init
83446 ? S 0:00 0 0 2900 1344 0.0 s6-svscan -t0 /var/run/s6/services
83466 ? S 0:00 0 0 2868 1300 0.0 \_ foreground if /etc/s6/init/init-stage2-redirfd foreground if if s6-echo -n -- [s6-init] making user provided files available at /var/ru
83471 ? S 0:00 0 0 2868 1300 0.0 | \_ if /etc/s6/init/init-stage2-redirfd foreground if if s6-echo -n -- [s6-init] making user provided files available at /var/run/s6/etc...
83472 ? S 0:00 0 0 2868 1300 0.0 | \_ foreground if if s6-echo -n -- [s6-init] making user provided files available at /var/run/s6/etc... foreground backtick -n S6_RUNT
83477 ? S 0:00 0 0 2864 1296 0.0 | \_ if if -t s6-test -d /var/run/s6/etc/services.d if s6-echo [services.d] starting services if pipeline s6-ls -0 -- /var/run/s
83714 ? S 0:00 0 0 2864 1296 0.0 | \_ if pipeline s6-ls -0 -- /var/run/s6/etc/services.d forstdin -0 -p -- i importas -u i i if s6-test -d /var/run/s6/etc/service
83718 ? R 1:03 0 0 2876 1304 0.0 | \_ forstdin -0 -p -- i importas -u i i if s6-test -d /var/run/s6/etc/services.d/${i} s6-hiercopy /var/run/s6/etc/services.d/${i} /var/run/s6/servi
83720 ? Z 0:00 0 0 0 0 0.0 | \_ [s6-hiercopy] <defunct>
83719 ? Z 0:00 0 0 0 0 0.0 | \_ [s6-ls] <defunct>
83467 ? S 0:00 0 0 2868 1300 0.0 \_ s6-supervise s6-fdholderd
84124 pts/5 Ss 0:00 0 0 70440 3548 0.1 /bin/login -h zone:global -f
84133 pts/5 S 0:00 0 0 20980 3768 0.1 \_ -bash
84147 pts/5 R 0:00 0 0 28300 3084 0.1 \_ ps -xfv
For some reason the s6-ls and s6-hiercopy seems to fail at boot, and no services get started.
I think the image (imgadm import) seems to have downloaded correctly - at a guess it could be the network settings (as I think the plex server will download bits on startup).
Can you zlogin into the zone and try to curl something? E.g.
curl http://www.google.com
same issue
[root@assg15-labor /zones/template]# cat 10.ADMIN-lx-docker-plex.json
{
"brand": "lx",
"kernel_version": "3.16.0",
"image_uuid": "4312dc68-0c0c-b559-702d-c13ace5171b4",
"autoboot": true,
"alias": "ADMIN-lx-docker-plex",
"hostname": "ADMIN-lx-docker-plex",
"delegate_dataset": true,
"dns_domain": "test.local",
"resolvers": [
"8.8.8.8",
"8.8.4.4"
],
"max_physical_memory": 4096,
"max_swap": 4096,
"tmpfs": 4096,
"quota": 25,
"cpu_cap": 100,
"cpu_shares": 100,
"max_lwps": 2000,
"nics": [
{
"nic_tag": "admin",
"ip": "1xx.xxx..xxx.xxx",
"netmask": "255.255.255.0",
"gateway": "1xx.xxx.xxx.20",
"primary": true
}
],
"docker": "true",
"internal_metadata": {
"docker:entrypoint": "[\"/init\"]",
"docker:cmd": "[\"/healthcheck.sh || exit 1\"]",
"docker:env": "[\"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\", \"TERM=xterm\", \"LANG=en_US.UTF-8\", \"LC_ALL=C.UTF-8\", \"CHANGE_CONFIG_DIR_OWNERSHIP=true\", \"HOME=/config\"]",
"docker:workingdir": "/data",
"docker:workdir": "/data",
"docker:tty": true,
"docker:attach_stdin": true,
"docker:attach_stdout": true,
"docker:attach_stderr": true,
"docker:open_stdin": true
}
}
[root@assg15-labor /zones/template]#
root@ADMIN-lx-docker-plex:/# ps -ax
PID TTY STAT TIME COMMAND
12439 pts/16 R 0:00 ps -ax
12390 ? S 0:00 if pipeline s6-ls -0 -- /var/run/s6/etc/services.d forstdin -0 -p -- i importas -u i i if s6-test -d /var/run/s6/etc/services.d/${i} s6-hiercopy /var/run/s6/etc/serv
12192 ? Ssl 0:00 ipmgmtd
12411 pts/16 S 0:00 -bash
12394 ? Z 0:00 [s6-ls] <defunct>
12402 pts/16 Ss 0:00 /bin/login -h zone:global -f
12395 ? Z 0:00 [s6-hiercopy] <defunct>
12393 ? R 2:04 forstdin -0 -p -- i importas -u i i if s6-test -d /var/run/s6/etc/services.d/${i} s6-hiercopy /var/run/s6/etc/services.d/${i} /var/run/s6/services/${i}
12237 ? S 0:00 if /etc/s6/init/init-stage2-redirfd foreground if if s6-echo -n -- [s6-init] making user provided files available at /var/run/s6/etc... foreground backtick -n
12243 ? S 0:00 if if -t s6-test -d /var/run/s6/etc/services.d if s6-echo [services.d] starting services if pipeline s6-ls -0 -- /var/run/s6/etc/services.d forstdin -0 -p
1 ? S 0:00 s6-svscan -t0 /var/run/s6/services
12238 ? S 0:00 foreground if if s6-echo -n -- [s6-init] making user provided files available at /var/run/s6/etc... foreground backtick -n S6_RUNTIME_PROFILE printcontenv S6_R
12233 ? S 0:00 s6-supervise s6-fdholderd
12232 ? S 0:00 foreground if /etc/s6/init/init-stage2-redirfd foreground if if s6-echo -n -- [s6-init] making user provided files available at /var/run/s6/etc... foregrou
root@ADMIN-lx-docker-plex:/# ./healthcheck.sh
curl: (7) Couldn't connect to server
root@ADMIN-lx-docker-plex:/# cat /healthcheck.sh
#!/bin/sh -e
TARGET=localhost
CURL_OPTS="--connect-timeout 15 --silent --show-error --fail"
curl ${CURL_OPTS} "http://${TARGET}:32400/identity" >/dev/null
root@ADMIN-lx-docker-plex:/#
The network works fine, the plex service can be manually started with a lot of fiddling.
Your right, a recent change (i.e. newer SmartOS platform) must have broken this.
It ran fine on the 201706 platform that I used for testing - but latest (20180323T002504Z) doesn't work correctly - shows the same issue you reported.
I poked at them with truss but did not get anywhere, they don't drop cores for as far as I can tell. So not much more info I was able to gather, maybe some dtrace could help but I'm not good with that.
Ok so it's not limited to just the plex docker image, I found another one that uses s6 that also has the problem, emby/embyserver:latest.
If the additional evidence is useful here, diginc/pi-hole (running a combination of dnsmasq and a couple of other services, also using s6) has been broken as well after a platform upgrade sometime around the new year. I’ve been too busy to diagnose the issue further since starting the daemons manually has provided a work-around, although I’d be happy to pull logs if it’d be helpful.
On Mar 25, 2018, at 10:05 AM, Jorge Schrauwen [email protected] wrote:
Ok so it's not limited to just the plex docker image, I found another one that uses s6 that also has the problem, emby/embyserver:latest.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
@jjelinek I did some investigation (platform bisecting) - it seems commit b036e0fd (https://smartos.org/bugview/OS-6467) to be the root cause of this issue.
I tested a platform build without that change:
sdcadm platform install -C experimental 0c35c502-3a3f-498a-b734-316a6af675bd
and all works correctly again.
I don't understand all of what's occurring in the LX vm (in the plex init setup), but it seems the issue is related to the fork/exec process cleanup, as processes are getting stuck in a "defunct" state and the parent process is stuck waiting on the child process(es) to finish, which never occurs.
Note the the forstdin seems to be stuck in the "sigsuspend" call:
# ptree -z 87e1160c-7a74-e576-be3a-fb644b6bd57c
22523 zsched
22599 s6-svscan -t0 /var/run/s6/services
22677 foreground if /etc/s6/init/init-stage2-redirfd foreground if if
22682 if /etc/s6/init/init-stage2-redirfd foreground if if s6-echo -n
22683 foreground if if s6-echo -n -- [s6-init] making user provided fi
22688 if if -t s6-test -d /var/run/s6/etc/services.d if s6-echo [servi
22850 if pipeline s6-ls -0 -- /var/run/s6/etc/services.d forstdin -0 -p
22853 forstdin -0 -p -- i importas -u i i if s6-test -d /var/run/s6/etc/services.d
22854 <defunct>
22855 <defunct>
22678 s6-supervise s6-fdholderd
# pstack 22853
22853: forstdin -0 -p -- i importas -u i i if s6-test -d /var/run/s6/etc/s
0000000000000000 sigsuspend (7fffef10ebc0)
OS-6898 seemed related, but a PI with that commit present has the same symptom.
Reminder for myself to look at this again on a release PI next time I am booted on one. This problem is gone for me running on a debug build I did myself that is completely up to date with the repos as of today.