blackbox_exporter
blackbox_exporter copied to clipboard
net/http: TLS handshake timeout is only 10s and not use probe timeout value
Host operating system: output of uname -a
Linux localhost 3.10.0-1160.11.1.el7.x86_64 #1 SMP Fri Dec 18 16:34:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
blackbox_exporter version: output of blackbox_exporter --version
blackbox_exporter, version 0.18.0 (branch: HEAD, revision: 60c86e6ce5a1111f7958b06ae7a08222bb6ec839)
build user: root@53d72328d93f
build date: 20201012-09:46:31
go version: go1.15.2
What is the blackbox.yml module config.
modules:
http_2xx:
prober: http
http:
valid_status_codes: [301, 302, 200, 401, 403, 415]
no_follow_redirects: true
preferred_ip_protocol: ip4
tls_config:
insecure_skip_verify: true
http_post_2xx:
prober: http
http:
method: POST
tcp_connect:
prober: tcp
timeout: 15s
pop3s_banner:
prober: tcp
tcp:
query_response:
- expect: "^+OK"
tls: true
tls_config:
insecure_skip_verify: false
ssh_banner:
prober: tcp
tcp:
query_response:
- expect: "^SSH-2.0-"
irc_banner:
prober: tcp
tcp:
query_response:
- send: "NICK prober"
- send: "USER prober prober prober :prober"
- expect: "PING :([^ ]+)"
send: "PONG ${1}"
- expect: "^:[^ ]+ 001"
icmp:
prober: icmp
timeout: 20s
icmp:
preferred_ip_protocol: ip4
What is the prometheus.yml scrape config.
global:
scrape_interval: 1m # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 1m # Evaluate rules every 15 seconds. The default is every 1 minute.
scrape_timeout: 15s
scrape_configs:
- job_name: 'service_check'
metrics_path: /probe
scrape_interval: 1m
scrape_timeout: 30s
params:
module: [http_2xx]
static_configs:
- targets:
- https://mycompany.com
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: localhost:9115
...
What logging output did you get from adding &debug=true to the probe URL?
probe_dns_lookup_time_seconds 0.000888445
probe_duration_seconds 10.036707606
probe_failed_due_to_regex 0
probe_http_content_length 0
probe_http_duration_seconds{phase="connect"} 0
probe_http_duration_seconds{phase="processing"} 0
probe_http_duration_seconds{phase="resolve"} 0.000888445
probe_http_duration_seconds{phase="tls"} 0
probe_http_duration_seconds{phase="transfer"} 0
probe_http_redirects 0
probe_http_ssl 0
probe_http_status_code 0
probe_http_uncompressed_body_length 0
probe_http_version 0
probe_ip_addr_hash 3.931567538e+09
probe_ip_protocol 4
probe_success 0
What did you do that produced an error?
curl http://localhost:9115/probe\?module\=http_2xx\&target\=https%3A%2F%2Fmycompany.com\&debug%3Dtrue
What did you expect to see?
probe_success 1 or 120s timeout
What did you see instead?
net/http: TLS handshake timeout is only 10s, instead of 120s global probe timeout
Logs for the probe:
ts=2021-02-23T06:19:22.251254881Z caller=main.go:304 module=http_2xx target="https://mycompany.com/?debug=true" level=info msg="Beginning probe" probe=http timeout_seconds=119.5
ts=2021-02-23T06:19:22.251369966Z caller=http.go:342 module=http_2xx target="https://mycompany.com/?debug=true" level=info msg="Resolving target address" ip_protocol=ip4
ts=2021-02-23T06:19:22.252372922Z caller=http.go:342 module=http_2xx target="https://mycompany.com/?debug=true" level=info msg="Resolved target address" ip=aaa.bbb.ccc.ddd
ts=2021-02-23T06:19:22.252452689Z caller=client.go:252 module=http_2xx target="https://mycompany.com/?debug=true" level=info msg="Making HTTP request" url="https://aaa.bbb.ccc.ddd/?debug=true" host=mycompany.com
ts=2021-02-23T06:19:32.288414104Z caller=main.go:119 module=http_2xx target="https://mycompany.com/?debug=true" level=error msg="Error for HTTP request" err="Get \"https://aaa.bbb.ccc.ddd/?debug=true\": net/http: TLS handshake timeout"
ts=2021-02-23T06:19:32.28846505Z caller=main.go:119 module=http_2xx target="https://mycompany.com?debug=true" level=info msg="Response timings for roundtrip" roundtrip=0 start=2021-02-23T06:19:22.25252433Z dnsDone=2021-02-23T06:19:22.25252433Z connectDone=2021-02-23T06:19:22.288073452Z gotConn=0001-01-01T00:00:00Z responseStart=0001-01-01T00:00:00Z end=0001-01-01T00:00:00Z
ts=2021-02-23T06:19:32.288508646Z caller=main.go:304 module=http_2xx target="https://mycompany.com/?debug=true" level=error msg="Probe failed" duration_seconds=10.03720212
Sorry, I looked at this the other day and forgot to follow up...
where's the 120s coming from?
Sorry, I looked at this the other day and forgot to follow up...
where's the 120s coming from?
Ah, the builtin default. Sorry, I missed that.
@mem yes, 120s is a default, so, do you have any ideas?
Timeouts can never be longer than the timeout provided by Prometheus. And timeouts from Prometheus are never longer than the scrape interval.
For questions/help/support please use our community channels. There are more people available to potentially respond to your request and the whole community can benefit from the answers provided.
@SuperQ I think the question is still valid as a bug report.
Starting here:
ts=2021-02-23T06:19:22.251254881Z caller=main.go:304 module=http_2xx target="https://mycompany.com/?debug=true" level=info msg="Beginning probe" probe=http timeout_seconds=119.5
that 119.5 is the timeout for the probe. It's 119.5 because it's 120 (the default when there's no timeout provided by Prometheus) minus 0.5 (the default offset).
Then:
ts=2021-02-23T06:19:32.288414104Z caller=main.go:119 module=http_2xx target="https://mycompany.com/?debug=true" level=error msg="Error for HTTP request" err="Get \"https://aaa.bbb.ccc.ddd/?debug=true\": net/http: TLS handshake timeout"
notice the 10 second difference between timestamps. This is timing out in the HTTP client after 10 seconds. That's the TLS handshake timeout in the roundtripper, which is hardcoded and as far as I can see non-configurable (in prometheus/common).
That's the question and that's the potential bug: why is the TLS handshake timing out after 10 seconds if the scrape timeout is 30s, the scrape interval is 60s and the built-in default timeout is 120s.
Any ideas?
@nurzhan86 did you use cert-manger in your k8s cluster? I had the same problem. but i find i can monitor the url normarlly if its certificate is buyed from Cloud vendor,and if its cert generated by cer-manager,i get the probelm(net/tls timeout),maybe i think it caused by dns
@nurzhan86 i had solved the problem, i check the keyAlgorithm of certificate, rsa is ok,but ecdsa will show tls timeout problem. when i generate the certficate with keyAlgorithm(rsa) again, and execute
curl "http://blackbox-exporter.monitoring:9115/probe?module=http_2xx&target=https://test1.test2.test3.com&debug=true"
it return probe_success 1
i cant change ssl certificate any other suggestions to resolve this in blackbox exporter?
I am having the same problem.
We have blackbox exporter monitoring an external API.
The service monitor has:
scrapeTimeout: 30s
and the probe is configured like this:
http_2xx_3xx_skip_tls_redir:
prober: http
http:
valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
follow_redirects: false
preferred_ip_protocol: "ip4"
method: GET
valid_status_codes: [200, 301, 302, 304, 307, 308]
tls_config:
insecure_skip_verify: true
and the logs look like this:
ts=2024-02-07T04:33:02.529411913Z caller=main.go:181 module=http_2xx_3xx_skip_tls_redir target=https://www.something.com level=info msg="Beginning probe" probe=http timeout_seconds=29.5
ts=2024-02-07T04:33:02.529557435Z caller=http.go:328 module=http_2xx_3xx_skip_tls_redir target=https://www.something.com level=info msg="Resolving target address" target=www.something.com ip_protocol=ip4
ts=2024-02-07T04:33:02.561461559Z caller=http.go:328 module=http_2xx_3xx_skip_tls_redir target=https://www.something.com level=info msg="Resolved target address" target=www.something.com ip=1.2.3.4
ts=2024-02-07T04:33:02.56156147Z caller=client.go:252 module=http_2xx_3xx_skip_tls_redir target=https://www.something.com level=info msg="Making HTTP request" url=https://1.2.3.4 host=www.something.com
ts=2024-02-07T04:33:12.587716412Z caller=handler.go:120 module=http_2xx_3xx_skip_tls_redir target=https://www.something.com level=error msg="Error for HTTP request" err="Get \"https://1.2.3.4": net/http: TLS handshake timeout"
ts=2024-02-07T04:33:12.587774483Z caller=handler.go:120 module=http_2xx_3xx_skip_tls_redir target=https://www.something.com level=info msg="Response timings for roundtrip" roundtrip=0 start=2024-02-07T04:33:02.561656981Z dnsDone=2024-02-07T04:33:02.561656981Z connectDone=2024-02-07T04:33:02.587066652Z gotConn=0001-01-01T00:00:00Z responseStart=0001-01-01T00:00:00Z tlsStart=2024-02-07T04:33:02.587143803Z tlsDone=2024-02-07T04:33:12.587695312Z end=0001-01-01T00:00:00Z
So we can see the timeout at the top is 29.5 s but the timings in the results show tls timed out after 10 s
We are running quay.io/prometheus/blackbox-exporter:v0.24.0
This is an intermittent issue, the majority of checks are fine with the TLS handshake taking well under a second