swarmprom icon indicating copy to clipboard operation
swarmprom copied to clipboard

Dockerd-exporters are always down

Open binakot opened this issue 6 years ago • 17 comments

Good day. And thanks for the great project. I really admire this one.

I run your stack on cluster with 1 manager and 2 workers. Everything looks good, but in Prometheus dashboard I see the next one:

dockerd-exporter-down

As you write here, I update /etc/docker/daemon.json and restart docker service:

{
  "experimental": true,
  "metrics-addr": "0.0.0.0:9323"
}

I check my DOCKER_GWBRIDGE_IP:

$ ip -o addr show docker_gwbridge

3: docker_gwbridge    inet 172.18.0.1/16 brd 172.18.255.255 scope global docker_gwbridge\       valid_lft forever preferred_lft forever

If I curl this endpoint with next IPs, everything works:

$ curl http://172.18.0.1:9323/metrics
$ curl http://0.0.0.0:9323/metrics
$ curl http://localhost:9323/metrics

But in Prometheus dockerd-exporter statuses are always down.

$ docker service logs mon_dockerd-exporter

mon_dockerd-exporter.0.ofok9t4isfk9@node-1    | Activating privacy features... done.
mon_dockerd-exporter.0.ofok9t4isfk9@node-1    | http://:9323
mon_dockerd-exporter.0.ofok9t4isfk9@node-1    | 03/Apr/2018:07:36:34 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.ofok9t4isfk9@node-1    | 03/Apr/2018:07:36:49 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.ofok9t4isfk9@node-1    | 03/Apr/2018:07:37:04 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.ofok9t4isfk9@node-1    | 03/Apr/2018:07:37:19 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.ofok9t4isfk9@node-1    | 03/Apr/2018:07:37:34 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.ofok9t4isfk9@node-1    | 03/Apr/2018:07:37:49 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.rfzakmu3h1ml@node-2    | Activating privacy features... done.
mon_dockerd-exporter.0.rfzakmu3h1ml@node-2    | http://:9323
mon_dockerd-exporter.0.rfzakmu3h1ml@node-2    | 03/Apr/2018:07:36:37 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.rfzakmu3h1ml@node-2    | 03/Apr/2018:07:36:52 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.rfzakmu3h1ml@node-2    | 03/Apr/2018:07:37:07 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.rfzakmu3h1ml@node-2    | 03/Apr/2018:07:37:22 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.rfzakmu3h1ml@node-2    | 03/Apr/2018:07:37:37 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.rfzakmu3h1ml@node-2    | 03/Apr/2018:07:37:52 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.z9ud5s9c2u2s@node-3    | Activating privacy features... done.
mon_dockerd-exporter.0.z9ud5s9c2u2s@node-3    | http://:9323
mon_dockerd-exporter.0.z9ud5s9c2u2s@node-3    | 03/Apr/2018:07:36:36 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.z9ud5s9c2u2s@node-3    | 03/Apr/2018:07:36:51 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.z9ud5s9c2u2s@node-3    | 03/Apr/2018:07:37:06 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.z9ud5s9c2u2s@node-3    | 03/Apr/2018:07:37:21 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.z9ud5s9c2u2s@node-3    | 03/Apr/2018:07:37:36 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.z9ud5s9c2u2s@node-3    | 03/Apr/2018:07:37:51 +0000 [ERROR 502 /metrics] context canceled

binakot avatar Apr 03 '18 06:04 binakot

if you have firewalld enabled you need to open the port:

firewall-cmd --permanent --add-port=9323/tcp

belfo avatar Jan 21 '19 15:01 belfo

if you have firewalld enabled you need to open the port:

firewall-cmd --permanent --add-port=9323/tcp

Nope, I haven't. Also other exportes work in the same network. Btw I fixed it. I just change the dockerd-exporter service definition in https://github.com/stefanprodan/swarmprom/blob/master/docker-compose.yml#L24 to

  dockerd-exporter:
    image: stefanprodan/dockerd-exporter
    networks:
      - net    
    deploy:
      mode: global
      resources:
        limits:
          memory: 128M
        reservations:
          memory: 64M

If you can check it on your side, I will close the issue ;)

binakot avatar Jan 22 '19 11:01 binakot

on my side it work with the caddy image, but the firweall was blocking access (as in the end docker expose this metrics already) the caddy is just used for having an automated task detection and just fw to the docker daemon. the dockerd-exported seems not so different it just do a socat.

belfo avatar Jan 22 '19 13:01 belfo

I see. Okay, issue can be closed.

binakot avatar Jan 22 '19 13:01 binakot

Same issue here.

Using stefanprodan/dockerd-exporter does not solve the issue in my case. I also find it hard to believe it has something to do with a firewall since the metrics endpoint is dead from the dockerd-exporter container itself. This call would not pass any firewall, would it?

selection_026

Log from the dockerd-exporter container:

2019/02/06 20:18:53 socat[43] N exit(1)


2019/02/06 20:18:53 socat[1] N childdied(): handling signal 17


2019/02/06 20:18:58 socat[1] N accepting connection from AF=2 10.0.4.185:45430 on AF=2 10.0.4.210:9323


2019/02/06 20:18:58 socat[1] N forked off child process 52


2019/02/06 20:18:58 socat[1] N listening on AF=2 0.0.0.0:9323


2019/02/06 20:18:58 socat[52] N opening connection to AF=2 172.18.0.1:9323


2019/02/06 20:19:07 socat[44] E connect(5, AF=2 172.18.0.1:9323, 16): Operation timed out


2019/02/06 20:19:07 socat[44] N exit(1)


2019/02/06 20:19:07 socat[1] N childdied(): handling signal 17


2019/02/06 20:19:13 socat[1] N accepting connection from AF=2 10.0.4.185:45448 on AF=2 10.0.4.210:9323


2019/02/06 20:19:13 socat[1] N forked off child process 53


2019/02/06 20:19:13 socat[1] N listening on AF=2 0.0.0.0:9323


2019/02/06 20:19:13 socat[53] N opening connection to AF=2 172.18.0.1:9323


2019/02/06 20:19:24 socat[45] E connect(5, AF=2 172.18.0.1:9323, 16): Operation timed out


2019/02/06 20:19:24 socat[45] N exit(1)

pascal08 avatar Feb 06 '19 20:02 pascal08

@pascal08 Yeah, this is some kind of magic. It worked last time on my dev host, but doesn't work in prod :)

I think the problem is in networks, because I can get metrics on localhost from host system, but got error when try to get from prometheus container.

binakot avatar Feb 07 '19 14:02 binakot

Ubuntu 18.04.1 LTS - docker server version: 18.09.1

  • filtering docker through ufw

i opened port 9323 in firewall for docker_gwbridge net with command:

# ufw allow from 172.18.0.0/16 to any port 9323

Problem with dockerd-exporter fixed!

ioagel avatar Feb 09 '19 10:02 ioagel

@ioagel At first I thought it had nothing to do with the firewall, but when I opened up the port from the Docker bridge network IP that error [ERROR 502 /metrics] context canceled changed into dial tcp 172.18.0.1:9323: getsockopt: connection refused. I got it to work when I also changed the metrics IP address to one reachable from my Prometheus container. Glad it works now. :)

pascal08 avatar Feb 09 '19 16:02 pascal08

Interesting... Will try smth like this next week too.

binakot avatar Feb 09 '19 19:02 binakot

@ioagel At first I thought it had nothing to do with the firewall, but when I opened up the port from the Docker bridge network IP that error [ERROR 502 /metrics] context canceled changed into dial tcp 172.18.0.1:9323: getsockopt: connection refused. I got it to work when I also changed the metrics IP address to one reachable from my Prometheus container. Glad it works now. :)

@ioagel, how did you change the metrics IP address to one reachable from your Prometheus container? I don't have problems with firewall, because it's inactive. I'm using Linux Ubuntu. I'm getting the same Error 502.

rljoia avatar Nov 07 '19 10:11 rljoia

I simply change: environment: - DOCKER_GWBRIDGE_IP=172.18.0.1 into environment: - DOCKER_GWBRIDGE_IP='172.18.0.1' It listened to http://:9323 so I guess the environment variable is wrong Sorry for my english

tamvcspk avatar Nov 27 '19 09:11 tamvcspk

@rljoia I was getting the same errors as @ioagel but I'll explain the config:

started with the error ERROR 502 /metrics] context canceled

  1. verify the address is 172.18.0.1
  2. make sure your firewall allows connections to the port 9323 from the virtual network (ufw allow from 172.18.0.0/16 to any port 9323 works for me)

This resulted in the error dial tcp 172.18.0.1:9323: getsockopt: connection refused.

My fix for this was to verify the json file located /etc/docker/daemon.json to be:

{
  "metrics-addr" : "0.0.0.0:9323",
  "experimental" : true
}

because I had set the address to be 127.0.0.1 per the instructions on Docker's website, but that was incorrect.

Also, because these are in global mode, make sure you do this .json file on every node of your cluster so it properly reads out the data.

MJGTwo avatar Jan 03 '20 22:01 MJGTwo

@rljoia I was getting the same errors as @ioagel but I'll explain the config:

started with the error ERROR 502 /metrics] context canceled

1. verify the address is 172.18.0.1

2. make sure your firewall allows connections to the port 9323 from the virtual network (`ufw allow from 172.18.0.0/16 to any port 9323` works for me)

This resulted in the error dial tcp 172.18.0.1:9323: getsockopt: connection refused.

My fix for this was to verify the json file located /etc/docker/daemon.json to be:

{
  "metrics-addr" : "0.0.0.0:9323",
  "experimental" : true
}

because I had set the address to be 127.0.0.1 per the instructions on Docker's website, but that was incorrect.

Also, because these are in global mode, make sure you do this .json file on every node of your cluster so it properly reads out the data.

This solution worked for me as well, thanks a lot!

rdehouss avatar Feb 10 '20 12:02 rdehouss

Get docker ip addres ip addr show docker0 Enter docker ip in prometherus.yml configuratin static_configs: - targets: ['172.17.0.1:9323'] docker

mishop avatar May 18 '20 01:05 mishop

Get docker ip addres ip addr show docker0 Enter docker ip in prometherus.yml configuratin static_configs: - targets: ['172.17.0.1:9323'] docker

Try to do it when you have 3 dedicated servers in a swarm, but not single localhost machine. Does it still work?

binakot avatar May 18 '20 11:05 binakot

Same problem... Docker Swarm & Traefik.

Capture d’écran 2021-04-09 à 16 54 58

configuration problem ?

Secursus avatar Apr 09 '21 15:04 Secursus

After all I just removed dockerd-exporter from my setup 😆

binakot avatar Apr 09 '21 18:04 binakot