blackbox_exporter icon indicating copy to clipboard operation
blackbox_exporter copied to clipboard

ip_protocol_fallback when IPv6 target returns icmp6 unreachable

Open candlerb opened this issue 2 years ago • 1 comments

Host operating system: output of uname -a

Linux prometheus 5.4.0-80-generic #90~18.04.1-Ubuntu SMP Tue Jul 13 19:40:02 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

blackbox_exporter version: output of blackbox_exporter --version

blackbox_exporter, version 0.19.0 (branch: HEAD, revision: 5d575b88eb12c65720862e8ad2c5890ba33d1ed0)
  build user:       root@2b0258d5a55a
  build date:       20210510-12:56:44
  go version:       go1.16.4
  platform:         linux/amd64

What is the blackbox.yml module config.

modules:
  certificate:
    prober: tcp
    timeout: 5s
    tcp:
      tls: true
      tls_config: {}

What is the prometheus.yml scrape config.

n/a

What logging output did you get from adding &debug=true to the probe URL?

# time curl -g 'localhost:9115/probe?module=certificate&target=prometheus.example.com:443&debug=true'
Logs for the probe:
ts=2021-08-14T10:57:38.558987285Z caller=main.go:320 module=certificate target=prometheus.example.com:443 level=info msg="Beginning probe" probe=tcp timeout_seconds=5
ts=2021-08-14T10:57:38.559144124Z caller=tcp.go:40 module=certificate target=prometheus.example.com:443 level=info msg="Resolving target address" ip_protocol=ip6
ts=2021-08-14T10:57:38.559387303Z caller=tcp.go:40 module=certificate target=prometheus.example.com:443 level=info msg="Resolved target address" ip=2606:4700:1:1::9876
ts=2021-08-14T10:57:38.559436728Z caller=tcp.go:121 module=certificate target=prometheus.example.com:443 level=info msg="Dialing TCP with TLS"
ts=2021-08-14T10:57:38.566401836Z caller=main.go:130 module=certificate target=prometheus.example.com:443 level=error msg="Error dialing TCP" err="dial tcp6 [2606:4700:1:1::9876]:443: connect: no route to host"
ts=2021-08-14T10:57:38.566488423Z caller=main.go:320 module=certificate target=prometheus.example.com:443 level=error msg="Probe failed" duration_seconds=0.007430518



Metrics that would have been returned:
# HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds
# TYPE probe_dns_lookup_time_seconds gauge
probe_dns_lookup_time_seconds 0.000288299
# HELP probe_duration_seconds Returns how long the probe took to complete in seconds
# TYPE probe_duration_seconds gauge
probe_duration_seconds 0.007430518
# HELP probe_failed_due_to_regex Indicates if probe failed due to regex
# TYPE probe_failed_due_to_regex gauge
probe_failed_due_to_regex 0
# HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to detect if the IP address changes.
# TYPE probe_ip_addr_hash gauge
probe_ip_addr_hash 1.706353704e+09
# HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6
# TYPE probe_ip_protocol gauge
probe_ip_protocol 6
# HELP probe_success Displays whether or not the probe was a success
# TYPE probe_success gauge
probe_success 0



Module configuration:
prober: tcp
timeout: 5s
http:
    ip_protocol_fallback: true
    follow_redirects: true
tcp:
    ip_protocol_fallback: true
    tls: true
icmp:
    ip_protocol_fallback: true
dns:
    ip_protocol_fallback: true

real	0m0.038s
user	0m0.012s
sys	0m0.012s

What did you do that produced an error?

Create a target name with both IPv4 and IPv6 addresses, but the IPv6 address gives "unreachable"

For testing purposes I used this in /etc/hosts:

# cat /etc/hosts
127.0.0.1 localhost

172.67.201.240  prometheus.example.com
2606:4700:1:1::9876  prometheus.example.com

# ping6 prometheus.example.com
PING prometheus.example.com(prometheus.example.com (2606:4700:1:1::9876)) 56 data bytes
From linx-lon1.as13335.net (2001:7f8:4::3417:1) icmp_seq=1 Destination unreachable: Address unreachable
From linx-lon1.as13335.net (2001:7f8:4::3417:1) icmp_seq=2 Destination unreachable: Address unreachable
From linx-lon1.as13335.net (2001:7f8:4::3417:1) icmp_seq=3 Destination unreachable: Address unreachable

What did you expect to see?

Since ip_protocol_fallback: true is set, I expected the failed connection on IPv6 to be followed by a connection attempt on IPv4.

What did you see instead?

No attempt is made to connect on IPv4.

tcpdump shows:

# tcpdump -i eth0 -nn host 172.67.201.240 or host 2606:4700:1:1::9876 or icmp6
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
11:00:35.563003 IP6 XXXX:XXXX:XXXX:XXXX::33.52582 > 2606:4700:1:1::9876.443: Flags [S], seq 441588982, win 64800, options [mss 1440,sackOK,TS val 92703423 ecr 0,nop,wscale 7], length 0
11:00:35.568393 IP6 2001:7f8:4::3417:1 > XXXX:XXXX:XXXX:XXXX::33: ICMP6, destination unreachable, unreachable address 2606:4700:1:1::9876, length 88
^C

Additional info

Similar results are obtained using ip -6 route add blackhole 2001:7f8:4::3417. In this case you get an EINVAL generated locally, instead of an icmp6 unreachable:

ts=2021-08-14T11:34:18.245170246Z caller=main.go:130 module=certificate target=prometheus.example.com:443 level=error msg="Error dialing TCP" err="dial tcp6 [2001:7f8:4::3417]:443: connect: invalid argument"

But again, there is no fallback to v4 from BBE.

candlerb avatar Aug 14 '21 11:08 candlerb

As posted by @roidelapluie in https://github.com/prometheus/blackbox_exporter/issues/819#issuecomment-904590649:

ip_protocol_fallback is only for DNS resolution, and this is the expected behaviour.

lapo-luchini avatar Sep 08 '21 13:09 lapo-luchini