examples icon indicating copy to clipboard operation
examples copied to clipboard

What's going on when the restart traefik instance on dcos?

Open hbceylan opened this issue 6 years ago • 3 comments

What's going on when the restart traefik instances on dcos? Our microservices are unreachable? Yes! How can I handle this?

screen shot 2018-06-06 at 21 52 09 screen shot 2018-06-06 at 21 52 50

hbceylan avatar Jun 06 '18 19:06 hbceylan

Hi! In order to get community help with this would you mind posting on either the users mailing list [email protected] or Slack at chat.dcos.io? I don't know too much about Traefik but you might find someone there who does 🙂

judithpatudith avatar Jun 06 '18 20:06 judithpatudith

@deric ^

ryadav88 avatar Jun 06 '18 22:06 ryadav88

@hbceylan Which Traefik package version do you use?

In the latest version there's a healthcheck configured on $PORT0:

  "healthChecks": [
    {
      "gracePeriodSeconds": 20,
      "intervalSeconds": 5,
      "maxConsecutiveFailures": 2,
      "portIndex": 0,
      "timeoutSeconds": 2,
      "delaySeconds": 15,
      "protocol": "MESOS_HTTP",
      "path": "/ping"
    }
  ],

in your case it appears that port 80 ("portIndex": 0) is used for public connections and does not respond to /ping (healthcheck request). Port 8080 is probably the "admin" interface entrypoint, that is configured to respond to healthchecks. Judging from the screenshot you should probably use:

      "portIndex": 1,

or reorder ports, so that healthchecks will pass (check error log). Also when you use:

  "upgradeStrategy": {
    "minimumHealthCapacity": 0.5
  },

it means that you'll need at least 2 public nodes, because you're allocating fixed ports 80,443,8080 which can't be allocated to multiple instances at the same time. When restarting task Marathon will kill one instance, stage the job and wait until healthcheck passes, then restart the remaining instance(s).

deric avatar Jun 07 '18 08:06 deric