tubesync icon indicating copy to clipboard operation
tubesync copied to clipboard

TubeSync keeps to stop working after a few days

Open MatthK opened this issue 4 years ago • 16 comments

I am running the latest TubeSync docker container and it normally works fine. But whenever it runs for 1-2 days, it stops working.

The log file shows the following, but nothing else:

192.168.7.10 - - [26/Aug/2021:13:42:21 +0800] "GET /static/fonts/fontawesome/fa-regular-400.woff2 HTTP/1.1" 304 0 "http://192.168.7.14:4848/static/styles/tubesync.css" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0"
[2021-08-26 18:23:46 +0800] [224] [INFO] Handling signal: term
[2021-08-26 18:23:49 +0800] [252] [INFO] Worker exiting (pid: 252)
[2021-08-26 18:23:49 +0800] [253] [INFO] Worker exiting (pid: 253)
[2021-08-26 18:23:49 +0800] [251] [INFO] Worker exiting (pid: 251)

How can I prevent this from happening? All my other docker containers run smoothly for weeks without a hiccup.

And when I try to start it again, it doesn't start up anymore, with the log showing this:

Error: Already running on PID 224 (or pid file '/run/app/gunicorn.pid' is stale)
[2021-08-26 22:55:24 +0800] [579] [INFO] Starting gunicorn 20.1.0
Error: Already running on PID 224 (or pid file '/run/app/gunicorn.pid' is stale)
[2021-08-26 22:55:25 +0800] [584] [INFO] Starting gunicorn 20.1.0
Error: Already running on PID 224 (or pid file '/run/app/gunicorn.pid' is stale)
[2021-08-26 22:55:26 +0800] [589] [INFO] Starting gunicorn 20.1.0
Error: Already running on PID 224 (or pid file '/run/app/gunicorn.pid' is stale)
[2021-08-26 22:55:27 +0800] [594] [INFO] Starting gunicorn 20.1.0
Error: Already running on PID 224 (or pid file '/run/app/gunicorn.pid' is stale)

MatthK avatar Aug 26 '21 14:08 MatthK

Your log shows the worker process is being sent a sigterm signal, basically it's being told to die. Usually this would be something external is killing it on purpose. Does your host run out of resources or some other limit being hit?

meeb avatar Aug 26 '21 15:08 meeb

Error: Already running on PID 224 (or pid file '/run/app/gunicorn.pid' is stale)

If you go in and delete that this file, it should start, IMO.

gregzuro avatar Aug 26 '21 19:08 gregzuro

@meeb There are a few other docker containers running on that machine, but I have the impression that it is not that busy. How could I check or monitor this?

@gregzuro I will try the next time. I simply recreated the container.

MatthK avatar Aug 27 '21 02:08 MatthK

@gregzuro the correct way to resolve this would be to restart the container.

@MatthK check the dmesg of the host machine and see if the OOM kicked in and killed the process. The worker process does, for currently unavoidable reasons, allocate quite a lot of RAM.

meeb avatar Aug 27 '21 02:08 meeb

When I tried to restart the container, it would show the above logs. So it kept staying in the "startup" state and not turn healthy.

I checked the dmesg. It "only" shows from 45039.xxxx to 56976.xxxx. When searching for anything like OOM or kill, no match is found.

dmesg.log

MatthK avatar Aug 27 '21 02:08 MatthK

Do you run the container with a docker --memory= flag or anything similar?

meeb avatar Aug 27 '21 02:08 meeb

This is how I run it:

sudo docker run \
  -d \
  --name TubeSync \
  -e PUID=1000 \
  -e PGID=1000 \
  -e TZ=Asia/Hong_Kong \
  -v /home/matth/TubeSync-config:/config \
  -v /mnt/movies/YouTube:/downloads \
  -p 4848:4848 \
  ghcr.io/meeb/tubesync:latest

MatthK avatar Aug 27 '21 02:08 MatthK

Can confirm that simply restarting the container does not fix the pid issue.

On Thu, Aug 26, 2021, 19:24 MatthK @.***> wrote:

This is how I run it:

sudo docker run
-d
--name TubeSync
-e PUID=1000
-e PGID=1000
-e TZ=Asia/Hong_Kong
-v /home/matth/TubeSync-config:/config
-v /mnt/movies/YouTube:/downloads
-p 4848:4848
ghcr.io/meeb/tubesync:latest

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/meeb/tubesync/issues/144#issuecomment-906875882, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABZP5472O4VD2GKGYZS47DT63ZNLANCNFSM5C3RKRNQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

gregzuro avatar Aug 27 '21 04:08 gregzuro

@gregzuro that doesn't make a huge amount of sense given the PID file doesn't exist until the container is started. Restarting the container does "fix" the issue so if you still experience stale PID issues that would be the gunicorn process dying unexpectedly for some reason and not clearing its PID file at which point the S6 init would attempt to start it again and fail with the old PID file still being present. I can add a check for the PID file on starting gunicorn, however it would probably be better to identify the root reason as to why these processes are being terminated in the first place. As the error says it may also be S6 trying to restart the process when still running fine for some reason which would be odd.

@MatthK Thanks for the details. I still can't see anything wrong with your setup but your initial error is quite clear that something is killing the worker process rather than an obvious issue in TubeSync itself. Either it's just something in your setup in which case it's going to be very hard to assist with or it's a wider general problem that I'll probably need more reports to find the cause.

I've only personally ever seen these errors when something like RAM limits are exhausted on a host, however I'll see if I can replicate this somehow locally to test with.

meeb avatar Aug 27 '21 05:08 meeb

I will monitor this a bit more closely and run the dmesg once shortly after it stopped working. So hopefully there will be some information in it then.

MatthK avatar Aug 27 '21 05:08 MatthK

Sounds like a good plan. If you can find out where the sigterm signal is being sent from we find a way to fix it.

meeb avatar Aug 27 '21 05:08 meeb

@gregzuro that doesn't make a huge amount of sense given the PID file doesn't exist until the container is started. Restarting the container does "fix" the issue so if you still experience stale PID issues that would be the gunicorn process dying unexpectedly for some reason and not clearing its PID file at which point the S6 init would attempt to start it again and fail with the old PID file still being present.

Yes. This is exactly what is happening.

If you create the container, the pid file is created when it first starts. If you then stop the container in a way that inhibits any removal of the pid file, then start the container, the old pid file is still there and prevents tubesync from coming up.

At least that's how it appears to happen for me. :)

gregzuro avatar Aug 27 '21 06:08 gregzuro

Ah so you're preserving the container filesystem with a stop/start and having the container shutdown in a non-clean way? The PID file is removed on a clean shutdown so it must have been a non-clean shutdown. That'd do it. OK I'll add a check for the PID file on starting gunicorn.

meeb avatar Aug 27 '21 06:08 meeb

Cool.

On Thu, Aug 26, 2021, 23:33 meeb @.***> wrote:

Ah so you're preserving the container filesystem with a stop/start and having the container shutdown in a non-clean way? The PID file is removed on a clean shutdown so it must have been a non-clean shutdown. That'd do it. OK I'll add a check for the PID file on starting gunicorn.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/meeb/tubesync/issues/144#issuecomment-906961320, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABZP5YEGU3PPF5DJWWBHSTT64WU7ANCNFSM5C3RKRNQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

gregzuro avatar Aug 27 '21 06:08 gregzuro

@gregzuro wait for the build to finish then give :latest a try.

meeb avatar Aug 27 '21 07:08 meeb

@meeb Looks good. Thanks!

gregzuro avatar Aug 27 '21 17:08 gregzuro

Pretty sure this is resolved by now. If it isn't feel free to re-open the issue.

meeb avatar Jan 18 '23 08:01 meeb