containerpilot icon indicating copy to clipboard operation
containerpilot copied to clipboard

CP ends up ignoring that it's jobs have been killed

Open daledude opened this issue 6 years ago • 0 comments

  • what is happening and what you expect to see Consul had a half hour issue accepting service checks. Containerpilot eventually stopped PUT-ing health check updates for all jobs to consul. CP does continue to PUT health status updates for itself.

Also, CP seems to get into a state where it doesn't see that any of the spawned jobs are gone. The /status endpoint shows jobs as healthy when I manually killed them myself.

Also, the rsyslog-check that is in every config ends up outputting the following even though running the check manually is successful:

check.rsyslog timeout after 5s: '[514]'

The "check-port" health check script is merely this:

#!/bin/bash
/bin/netstat -tunl | /bin/grep ":$1 " > /dev/null 2>&1
ret=$?
exit $ret
  • the output of containerpilot -version Version: 3.8.0 GitHash: 408dbc9

  • the ContainerPilot configuration you're using Doesn't matter the config. Happens to all my containers. Here is one anyways:

{
    consul: "{{.CONTAINER_HOST}}:8500",
    logging:
    {
        level: "INFO",
        format: "default",
        output: "stdout"
    },
    jobs: [
        {
            name: "rsyslog",
            exec: [ "rsyslogd-wrapper" ],
            restarts: "unlimited",
            health:
            {
                exec: "check-port 514", // Just simple: netstat ntlp | grep PORT
                interval: 2,
                ttl: 10,
                timeout: "5s",
            },
        },
        {{ if .DNSMASQ_SIDECAR }}
        {
            name: 'dnsmasq-{{.SERVICE_NAME_FULL}}',
            exec: [ "/usr/sbin/dnsmasq", "-k" ],
            restarts: "unlimited",
            port: "53",
            health:
            {
                exec: "check-port 53",
                interval: 2,
                ttl: 10,
                timeout: "5s",
            },
        },
        {{ end }}
        {
            name: "{{.SERVICE_NAME_FULL}}",
            when: {
              source: "watch.namingservice-{{.SERVICE_PLATFORM}}-{{.SERVICE_ENVIRONMENT}}",
              once: "healthy"
            },
            exec: [ 
                   "gosu", "admin",
                   "{{.BINDIR}}/{{.SERVICE_NAME}}", "-c", "{{.BASEDIR}}/cfg/{{.SERVICE_NAME}}.cfg", "-r", "short-recovery"
                  ],
	        restarts: "unlimited",
            port: "{{.SERVICE_PORT}}", // Causes service to be registered with Consul.
            health:
            {
                exec: "check-port {{.SERVICE_PORT}}",
                interval: 1,
                ttl: 10,
                timeout: "5s",
            },
            tags: [
                "{{.SERVICE_NAME}}",
                "{{.CONTAINER_HOST}}",
                "{{.SERVICE_ENVIRONMENT}}",
                "{{.SERVICE_PLATFORM}}"
            ],
            interfaces: [
                "10.0.0.0/8"
            ],
            consul:
            {
                enableTagOverride: true,
                deregisterCriticalServiceAfter: "6h"
            }
        },
        {
            // This job will watch for an event from Containerpilot that is fired
            //   when the "source" job in this config exits with a retcode > 0.
            // It then sends an event through Consul to notify this has occured.
            // A script run on the monitoring server will read the event
            //   from Consul.
            name: "{{.SERVICE_NAME_FULL}}-exit-failed-watcher",
            when: {
                source: "{{.SERVICE_NAME_FULL}}", // Must match the job name of the exec to watch.
                each: "exitFailed"
            },
            exec: [
                "send-consul-event", "service-exit-failed", "container_host={{.CONTAINER_HOST}}|service={{.SERVICE_NAME_FULL}}|hostname={{.HOSTNAME}}"
            ]
        }
    ],
    watches: [
      {
        name: "namingservice-{{.SERVICE_PLATFORM}}-{{.SERVICE_ENVIRONMENT}}",
        interval: 3
      }
    ]
}
  • the output of any logs you can share; if you can it would be very helpful to turn on debug logging by adding logging: { level: "DEBUG"} to your ContainerPilot configuration. I have logging set to debug but I don't have anything related to the issue. Seems logging output stopped?

daledude avatar Mar 08 '19 21:03 daledude