MistContoller fails to start other components
There is some race condition, which makes the execution of the livepeer/catalyst container flaky. In result, sometimes, when catalyst is started, there are no logs at all and the only running process is MistController.
Steps to reproduce
Start multiple livepeer catalyst and find the ones with no logs
for i in $(seq 0 100); do CID=$(docker run -d --rm livepeer/catalyst); sleep 5; LOGS=$(docker logs $CID); [[ -z "$LOGS" ]] || docker kill $CID;done
This command will start 100 container one by one and kill all expect the ones that produce no logs. Then check the running containers and if there are none, then retry the command above. In my case, there was always at least one running container left.
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
79e3ec38e0c6 livepeer/catalyst "/usr/bin/MistContro…" 8 minutes ago Up 8 minutes 1935/tcp, 4242/tcp, 8080/tcp, 8889/udp sleepy_poincare
2. Exec into the container and check the running processes
You should find that the only running process is MistController.
$ docker exec -it 79e3ec38e0c6 sh
# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 16080 4340 ? Ssl 13:47 0:00 /usr/bin/MistController -c /etc/livepeer/catalyst.json
root 14 0.1 0.0 2060 496 pts/0 Ss 13:57 0:00 sh
root 20 0.0 0.0 5480 2384 pts/0 R+ 13:57 0:00 ps aux
Interesting! Can you get a gdb backtrace of the "stuck" controller?
There are always 2 MistController processes running. Here are backtraces.
# gdb MistController 26335
...
MistController: No such file or directory.
Attaching to process 26335
[New LWP 26368]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
warning: Target and debugger are in different PID namespaces; thread lists and other data are likely unreliable. Connect to gdbserver inside the container.
0x00007f0f39222c7f in __GI___wait4 (pid=7, stat_loc=0x7ffcc30467cc, options=0, usage=0x0)
at ../sysdeps/unix/sysv/linux/wait4.c:27
27 ../sysdeps/unix/sysv/linux/wait4.c: No such file or directory.
(gdb) backtrace
#0 0x00007f0f39222c7f in __GI___wait4 (pid=7, stat_loc=0x7ffcc30467cc, options=0, usage=0x0)
at ../sysdeps/unix/sysv/linux/wait4.c:27
#1 0x000055fb50a46b9a in main (argc=3, argv=0x7ffcc30469e8)
at /tmp/drone-q2IcpPwEcM8EB3pN/drone/src/src/controller/controller.cpp:677
# gdb MistController 26367
...
For help, type "help".
Type "apropos word" to search for commands related to "word"...
MistController: No such file or directory.
Attaching to process 26367
[New LWP 26369]
[New LWP 26370]
[New LWP 26371]
[New LWP 26605]
[New LWP 26606]
[New LWP 26607]
[New LWP 26609]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
warning: Target and debugger are in different PID namespaces; thread lists and other data are likely unreliable. Connect to gdbserver inside the container.
0x00007f0f3969b4ff in __libc_accept (fd=0, addr=..., addr@entry=..., len=len@entry=0x7ffcc3045b48)
at ../sysdeps/unix/sysv/linux/accept.c:26
26 ../sysdeps/unix/sysv/linux/accept.c: No such file or directory.
(gdb) backtrace
#0 0x00007f0f3969b4ff in __libc_accept (fd=0, addr=..., addr@entry=..., len=len@entry=0x7ffcc3045b48)
at ../sysdeps/unix/sysv/linux/accept.c:26
#1 0x000055fb50af2995 in Socket::Server::accept (this=this@entry=0x7ffcc3045e50,
nonblock=nonblock@entry=false) at /tmp/drone-q2IcpPwEcM8EB3pN/drone/src/lib/socket.cpp:1512
#2 0x000055fb50aa5199 in Util::Config::threadServer (this=this@entry=0x55fb50ed81a0 <Controller::conf>,
server_socket=..., callback=callback@entry=
0x55fb50a92db0 <Controller::handleAPIConnection(Socket::Connection&)>)
at /tmp/drone-q2IcpPwEcM8EB3pN/drone/src/lib/config.cpp:457
#3 0x000055fb50aa55e6 in Util::Config::serveThreadedSocket (this=0x55fb50ed81a0 <Controller::conf>,
callback=0x55fb50a92db0 <Controller::handleAPIConnection(Socket::Connection&)>)
at /tmp/drone-q2IcpPwEcM8EB3pN/drone/src/lib/config.cpp:521
#4 0x000055fb50a4c5e4 in main_loop (argc=<optimized out>, argv=<optimized out>)
at /tmp/drone-q2IcpPwEcM8EB3pN/drone/src/src/controller/controller.cpp:589
#5 0x000055fb50a46d05 in main (argc=3, argv=0x7ffcc30469e8)
at /tmp/drone-q2IcpPwEcM8EB3pN/drone/src/src/controller/controller.cpp:668
@Thulinma any news on that issue? Do you maybe know where it comes from?