swarmprom
swarmprom copied to clipboard
Dockerd-exporters are always down
Good day. And thanks for the great project. I really admire this one.
I run your stack on cluster with 1 manager and 2 workers. Everything looks good, but in Prometheus dashboard I see the next one:
As you write here, I update /etc/docker/daemon.json
and restart docker service:
{
"experimental": true,
"metrics-addr": "0.0.0.0:9323"
}
I check my DOCKER_GWBRIDGE_IP
:
$ ip -o addr show docker_gwbridge
3: docker_gwbridge inet 172.18.0.1/16 brd 172.18.255.255 scope global docker_gwbridge\ valid_lft forever preferred_lft forever
If I curl
this endpoint with next IPs, everything works:
$ curl http://172.18.0.1:9323/metrics
$ curl http://0.0.0.0:9323/metrics
$ curl http://localhost:9323/metrics
But in Prometheus dockerd-exporter
statuses are always down.
$ docker service logs mon_dockerd-exporter
mon_dockerd-exporter.0.ofok9t4isfk9@node-1 | Activating privacy features... done.
mon_dockerd-exporter.0.ofok9t4isfk9@node-1 | http://:9323
mon_dockerd-exporter.0.ofok9t4isfk9@node-1 | 03/Apr/2018:07:36:34 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.ofok9t4isfk9@node-1 | 03/Apr/2018:07:36:49 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.ofok9t4isfk9@node-1 | 03/Apr/2018:07:37:04 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.ofok9t4isfk9@node-1 | 03/Apr/2018:07:37:19 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.ofok9t4isfk9@node-1 | 03/Apr/2018:07:37:34 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.ofok9t4isfk9@node-1 | 03/Apr/2018:07:37:49 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.rfzakmu3h1ml@node-2 | Activating privacy features... done.
mon_dockerd-exporter.0.rfzakmu3h1ml@node-2 | http://:9323
mon_dockerd-exporter.0.rfzakmu3h1ml@node-2 | 03/Apr/2018:07:36:37 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.rfzakmu3h1ml@node-2 | 03/Apr/2018:07:36:52 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.rfzakmu3h1ml@node-2 | 03/Apr/2018:07:37:07 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.rfzakmu3h1ml@node-2 | 03/Apr/2018:07:37:22 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.rfzakmu3h1ml@node-2 | 03/Apr/2018:07:37:37 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.rfzakmu3h1ml@node-2 | 03/Apr/2018:07:37:52 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.z9ud5s9c2u2s@node-3 | Activating privacy features... done.
mon_dockerd-exporter.0.z9ud5s9c2u2s@node-3 | http://:9323
mon_dockerd-exporter.0.z9ud5s9c2u2s@node-3 | 03/Apr/2018:07:36:36 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.z9ud5s9c2u2s@node-3 | 03/Apr/2018:07:36:51 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.z9ud5s9c2u2s@node-3 | 03/Apr/2018:07:37:06 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.z9ud5s9c2u2s@node-3 | 03/Apr/2018:07:37:21 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.z9ud5s9c2u2s@node-3 | 03/Apr/2018:07:37:36 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.z9ud5s9c2u2s@node-3 | 03/Apr/2018:07:37:51 +0000 [ERROR 502 /metrics] context canceled
if you have firewalld enabled you need to open the port:
firewall-cmd --permanent --add-port=9323/tcp
if you have firewalld enabled you need to open the port:
firewall-cmd --permanent --add-port=9323/tcp
Nope, I haven't. Also other exportes work in the same network. Btw I fixed it. I just change the dockerd-exporter service definition in https://github.com/stefanprodan/swarmprom/blob/master/docker-compose.yml#L24 to
dockerd-exporter:
image: stefanprodan/dockerd-exporter
networks:
- net
deploy:
mode: global
resources:
limits:
memory: 128M
reservations:
memory: 64M
If you can check it on your side, I will close the issue ;)
on my side it work with the caddy image, but the firweall was blocking access (as in the end docker expose this metrics already) the caddy is just used for having an automated task detection and just fw to the docker daemon. the dockerd-exported seems not so different it just do a socat.
I see. Okay, issue can be closed.
Same issue here.
Using stefanprodan/dockerd-exporter
does not solve the issue in my case. I also find it hard to believe it has something to do with a firewall since the metrics endpoint is dead from the dockerd-exporter container itself. This call would not pass any firewall, would it?
Log from the dockerd-exporter container:
2019/02/06 20:18:53 socat[43] N exit(1)
2019/02/06 20:18:53 socat[1] N childdied(): handling signal 17
2019/02/06 20:18:58 socat[1] N accepting connection from AF=2 10.0.4.185:45430 on AF=2 10.0.4.210:9323
2019/02/06 20:18:58 socat[1] N forked off child process 52
2019/02/06 20:18:58 socat[1] N listening on AF=2 0.0.0.0:9323
2019/02/06 20:18:58 socat[52] N opening connection to AF=2 172.18.0.1:9323
2019/02/06 20:19:07 socat[44] E connect(5, AF=2 172.18.0.1:9323, 16): Operation timed out
2019/02/06 20:19:07 socat[44] N exit(1)
2019/02/06 20:19:07 socat[1] N childdied(): handling signal 17
2019/02/06 20:19:13 socat[1] N accepting connection from AF=2 10.0.4.185:45448 on AF=2 10.0.4.210:9323
2019/02/06 20:19:13 socat[1] N forked off child process 53
2019/02/06 20:19:13 socat[1] N listening on AF=2 0.0.0.0:9323
2019/02/06 20:19:13 socat[53] N opening connection to AF=2 172.18.0.1:9323
2019/02/06 20:19:24 socat[45] E connect(5, AF=2 172.18.0.1:9323, 16): Operation timed out
2019/02/06 20:19:24 socat[45] N exit(1)
@pascal08 Yeah, this is some kind of magic. It worked last time on my dev host, but doesn't work in prod :)
I think the problem is in networks, because I can get metrics on localhost from host system, but got error when try to get from prometheus container.
Ubuntu 18.04.1 LTS - docker server version: 18.09.1
- filtering docker through ufw
i opened port 9323 in firewall for docker_gwbridge net with command:
# ufw allow from 172.18.0.0/16 to any port 9323
Problem with dockerd-exporter fixed!
@ioagel At first I thought it had nothing to do with the firewall, but when I opened up the port from the Docker bridge network IP that error [ERROR 502 /metrics] context canceled
changed into dial tcp 172.18.0.1:9323: getsockopt: connection refused
. I got it to work when I also changed the metrics IP address to one reachable from my Prometheus container. Glad it works now. :)
Interesting... Will try smth like this next week too.
@ioagel At first I thought it had nothing to do with the firewall, but when I opened up the port from the Docker bridge network IP that error
[ERROR 502 /metrics] context canceled
changed intodial tcp 172.18.0.1:9323: getsockopt: connection refused
. I got it to work when I also changed the metrics IP address to one reachable from my Prometheus container. Glad it works now. :)
@ioagel, how did you change the metrics IP address to one reachable from your Prometheus container? I don't have problems with firewall, because it's inactive. I'm using Linux Ubuntu. I'm getting the same Error 502.
I simply change:
environment: - DOCKER_GWBRIDGE_IP=172.18.0.1
into
environment: - DOCKER_GWBRIDGE_IP='172.18.0.1'
It listened to http://:9323 so I guess the environment variable is wrong
Sorry for my english
@rljoia I was getting the same errors as @ioagel but I'll explain the config:
started with the error ERROR 502 /metrics] context canceled
- verify the address is 172.18.0.1
- make sure your firewall allows connections to the port 9323 from the virtual network (
ufw allow from 172.18.0.0/16 to any port 9323
works for me)
This resulted in the error dial tcp 172.18.0.1:9323: getsockopt: connection refused
.
My fix for this was to verify the json file located /etc/docker/daemon.json
to be:
{
"metrics-addr" : "0.0.0.0:9323",
"experimental" : true
}
because I had set the address to be 127.0.0.1
per the instructions on Docker's website, but that was incorrect.
Also, because these are in global mode, make sure you do this .json file on every node of your cluster so it properly reads out the data.
@rljoia I was getting the same errors as @ioagel but I'll explain the config:
started with the error
ERROR 502 /metrics] context canceled
1. verify the address is 172.18.0.1 2. make sure your firewall allows connections to the port 9323 from the virtual network (`ufw allow from 172.18.0.0/16 to any port 9323` works for me)
This resulted in the error
dial tcp 172.18.0.1:9323: getsockopt: connection refused
.My fix for this was to verify the json file located
/etc/docker/daemon.json
to be:{ "metrics-addr" : "0.0.0.0:9323", "experimental" : true }
because I had set the address to be
127.0.0.1
per the instructions on Docker's website, but that was incorrect.Also, because these are in global mode, make sure you do this .json file on every node of your cluster so it properly reads out the data.
This solution worked for me as well, thanks a lot!
Get docker ip addres
ip addr show docker0
Enter docker ip in prometherus.yml configuratin
static_configs: - targets: ['172.17.0.1:9323']
Get docker ip addres
ip addr show docker0
Enter docker ip in prometherus.yml configuratinstatic_configs: - targets: ['172.17.0.1:9323']
Try to do it when you have 3 dedicated servers in a swarm, but not single localhost machine. Does it still work?
Same problem... Docker Swarm & Traefik.

configuration problem ?
After all I just removed dockerd-exporter from my setup 😆