docker-transmission-openvpn icon indicating copy to clipboard operation
docker-transmission-openvpn copied to clipboard

Container exits with code 0 but it should be a non-zero code

Open theAkito opened this issue 1 year ago • 8 comments

Is there a pinned issue for this?

  • [X] I have read the pinned issues and could not find my issue

Is there an existing or similar issue/discussion for this?

  • [X] I have searched the existing issues
  • [X] I have searched the existing discussions

Is there any comment in the documentation for this?

  • [X] I have read the documentation, especially the FAQ and Troubleshooting parts

Is this related to a provider?

  • [X] I have checked the provider repo for issues
  • [X] My issue is NOT related to a provider

Are you using the latest release?

  • [X] I am using the latest release

Have you tried using the dev branch latest?

  • [X] I have tried using dev branch

Docker run config used

version: '3.7'
services:
  transmission-openvpn:
    container_name: transmission-openvpn
    cap_add:
      - NET_ADMIN
.....
.....
.....
  autoheal: # https://github.com/willfarrell/docker-autoheal
    image: willfarrell/autoheal
    environment: # https://github.com/willfarrell/docker-autoheal#env-defaults
      - AUTOHEAL_CONTAINER_LABEL=autoheal
      - AUTOHEAL_INTERVAL=5   # check every 5 seconds
      - AUTOHEAL_START_PERIOD=30   # wait 0 seconds before first health check
      - AUTOHEAL_DEFAULT_STOP_TIMEOUT=10   # Docker waits max 10 seconds (the Docker default) for a container to stop before killing during restarts (container overridable via label, see below)
      - DOCKER_SOCK=/var/run/docker.sock   # Unix socket for curl requests to Docker API
      - CURL_TIMEOUT=10     # --max-time seconds for curl requests to Docker API
      - WEBHOOK_URL=""    # post message to the webhook if a container was restarted (or restart failed)
    volumes:
      - '/var/run/docker.sock:/var/run/docker.sock'

Current Behavior

transmission-openvpn    | [Secure-Server] Inactivity timeout (--ping-restart), restarting
transmission-openvpn    | /etc/openvpn/tunnelDown.sh tun0 ***************************************** init
transmission-openvpn    | resolv.conf was restored
transmission-openvpn    | Sending kill signal to transmission-daemon
transmission-openvpn    | Waiting 5s for transmission-daemon to die
transmission-openvpn    | Successfuly closed transmission-daemon
.....
.....
.....
transmission-openvpn    | SIGTERM[soft,ping-restart] received, process exiting
transmission-openvpn exited with code 0

The container exits with code 0.

Expected Behavior

The container shall exit with a non-zero code, to indicate failure, i.e. unhealthiness, so the autoheal functionality kicks in.

How have you tried to solve the problem?

This is an essential issue with the Docker image, so upstream needs to fix it.

Log output

transmission-openvpn    | [Secure-Server] Inactivity timeout (--ping-restart), restarting
transmission-openvpn    | /etc/openvpn/tunnelDown.sh tun0 ***************************************** init
transmission-openvpn    | resolv.conf was restored
transmission-openvpn    | Sending kill signal to transmission-daemon
transmission-openvpn    | Waiting 5s for transmission-daemon to die
transmission-openvpn    | Successfuly closed transmission-daemon
.....
.....
.....
transmission-openvpn    | SIGTERM[soft,ping-restart] received, process exiting
transmission-openvpn exited with code 0

HW/SW Environment

- OS: Debian GNU/Linux 11 (bullseye)
- Docker: 20.10.17

Anything else?

I did not find the script that manages the error catching & manipulation, in this repository. I can fix it, if you show me where these OpenVPN related errors are handled & manipulated.

This is why I decided to use Autoheal.

  • https://github.com/haugene/docker-transmission-openvpn/blob/25b9724178f48227084f5a462b82b1fbc087498d/docs/faq.md#set-the-ping-exit-option-for-openvpn-and-restart-flag-in-docker
  • https://github.com/haugene/docker-transmission-openvpn/blob/25b9724178f48227084f5a462b82b1fbc087498d/docs/faq.md#use-a-third-party-tool-to-monitor-and-restart-the-container

theAkito avatar Sep 11 '22 19:09 theAkito

this is where the sigterm is trapped https://github.com/haugene/docker-transmission-openvpn/blob/master/scripts/healthcheck.sh

pkishino avatar Sep 12 '22 01:09 pkishino

this is where the sigterm is trapped https://github.com/haugene/docker-transmission-openvpn/blob/master/scripts/healthcheck.sh

That's not the right script. Not related to the problem, at all.

I finally found the problem.

https://github.com/haugene/docker-transmission-openvpn/blob/e6fd367db74075e2b507d420191a55a43b5e8d90/openvpn/modify-openvpn-config.sh#L47

ping-exit lets OpenVPN exit with a zero exit code on failure. This makes the container stop "properly", so it does not count as a failure.

I do not see an easy way to prevent this, except making significant changes or adding another tool just for checking this, which would be overkill.

theAkito avatar Sep 14 '22 21:09 theAkito

If you read the script you can disable each of the modifications if you want.

On Thu, 15 Sep 2022 at 06:43, theAkito @.***> wrote:

this is where the sigterm is trapped

https://github.com/haugene/docker-transmission-openvpn/blob/master/scripts/healthcheck.sh

That's not the right script. Not related to the problem, at all.

I finally found the problem.

https://github.com/haugene/docker-transmission-openvpn/blob/e6fd367db74075e2b507d420191a55a43b5e8d90/openvpn/modify-openvpn-config.sh#L47

ping-exit lets OpenVPN exit with a zero exit code on failure. This makes the container stop "properly", so it does not count as a failure.

I do not see an easy way to prevent this, except making significant changes or adding another tool just for checking this, which would be overkill.

— Reply to this email directly, view it on GitHub https://github.com/haugene/docker-transmission-openvpn/issues/2341#issuecomment-1247332147, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA7OFYUH62K7EY4FCNK53F3V6JBI3ANCNFSM6AAAAAAQJ47ATU . You are receiving this because you commented.Message ID: @.***>

pkishino avatar Sep 15 '22 01:09 pkishino

If you read the script you can disable each of the modifications

I don't think the modifications are bad. They are pretty good. It's just that ping-exit works the wrong way and there is no reasonable way to change that in OpenVPN.

So, to circumvent this, it would be necessary to issue a lot of effort into solving such a seemingly small problem.

Which is why I'm not sure how this problem should be approached.

theAkito avatar Sep 15 '22 23:09 theAkito

Sorry, I just reread the initial issue. But I’m still not sure exactly why you are trying to do this in such a way.. we have an auto heal script built in which sets the autoheal flag and reports the container as unhealthy.. that’s when you can use the third party container to Restart it… Soo I don’t see the issue here

pkishino avatar Sep 16 '22 00:09 pkishino

Sorry, I just reread the initial issue. But I’m still not sure exactly why you are trying to do this in such a way.. we have an auto heal script built in which sets the autoheal flag and reports the container as unhealthy.. that’s when you can use the third party container to Restart it… Soo I don’t see the issue here

The issue is, that any healthcheck that was built in is absolutely useless in this situation. The reason for that is, that this whole disconnecting and exiting process takes about 3 seconds. So, the container is already long gone with a zero exit code, before any healthcheck mechanism had the chance to kick in. Now, when it exits with this zero exit code, the container stopped "gracefully" according to Docker, which means there is nothing to fix or restart, from the view of autoheal, Docker or whatever keepalive mechanism there is.

Therefore, it would've been ideal, if the container stopped with a non-zero exit code to begin with, which would make the autoheal method work and Docker's on-failure would also work.

As it is the case now, the only way to make this container restart is using Docker's restart policy unless-stopped. All other alternatives currently do not work, but are meant to work and should work.

theAkito avatar Sep 16 '22 11:09 theAkito

Well, For me it works fine based on connectivity check and when the vpn server disconnects, Container gets marked as unhealthy and is restarted.

On Fri, 16 Sep 2022 at 20:36, theAkito @.***> wrote:

Sorry, I just reread the initial issue. But I’m still not sure exactly why you are trying to do this in such a way.. we have an auto heal script built in which sets the autoheal flag and reports the container as unhealthy.. that’s when you can use the third party container to Restart it… Soo I don’t see the issue here

The issue is, that any healthcheck that was built in is absolutely useless in this situation. The reason for that is, that this whole disconnecting and exiting process takes about 3 seconds. So, the container is already long gone with a zero exit code, before any healthcheck mechanism had the chance to kick in. Now, when it exits with this zero exit code, the container stopped "gracefully" according to Docker, which means there is nothing to fix or restart, from the view of autoheal, Docker or whatever keepalive mechanism there is.

Therefore, it would've been ideal, if the container stopped with a non-zero exit code to begin with, which would make the autoheal method work and Docker's on-failure would also work.

As it is the case now, the only way to make this container restart is using Docker's restart policy unless-stopped. All other alternatives currently do not work, but are meant to work and should work.

— Reply to this email directly, view it on GitHub https://github.com/haugene/docker-transmission-openvpn/issues/2341#issuecomment-1249256177, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA7OFYVHCUXFK7NTA3GAR5TV6RLSPANCNFSM6AAAAAAQJ47ATU . You are receiving this because you commented.Message ID: @.***>

pkishino avatar Sep 16 '22 11:09 pkishino

For me it works fine based on connectivity check and when the vpn server disconnects, Container gets marked as unhealthy and is restarted.

It's not the case though, because then autoheal would've worked to begin with in my case and I wouldn't have created this issue in the first place.

Maybe your restart policy is set to unless-stopped.

There is no way the healthcheck can fix this, when the container is dead with a zero exit code, so quickly.

theAkito avatar Sep 16 '22 11:09 theAkito