fortigate_exporter icon indicating copy to clipboard operation
fortigate_exporter copied to clipboard

Scrape timeouts for 15 minutes after Fortigate failover

Open p-v-a opened this issue 1 year ago • 3 comments

I have experienced a scraping timeout that lasts around 15 minutes after Fortigate node failover. exporter logs shows the following errors for the whole duration

2023/05/12 00:02:27 Error: API connectivity test failed, Get "https://forti.net:8443/api/v2/monitor/system/status": context canceled
2023/05/12 00:02:27 Probe of "https://forti.net:8443" failed, took 29.901 seconds

It's probably related to how Fortigate handles session pickup, however I found that disabling http2 for exporter solves this issue. As a workaround one can set environment variable GODEBUG=http2client=0, however would be good to have support in exporter for this scenario.

p-v-a avatar May 23 '23 00:05 p-v-a

Just to add more details about this issue. It seems it related to how fortigate handles HTTPS session pickup. But by the look of it failure mode is a following:

  • Secondary unit picks up TCP session, but not HTTPS (our boxes have different TLS certs, so secondary box don't have certificate of the primary and vice versa), I don't really experimented much with certificates though, so might not be the root cause, nevertheless it feels like it's something TLS related.
  • This cause Fortigate to ignore all incoming packets from exporter
  • On the other hand, exporter is using http2 persistent connection, which lead to it trying to reuse http connection if this is available
  • Because Fortigate never reply with TCP RST, rather just ignoring packets, exporter keeps getting timeouts until http2 timeout expires
  • exporter initiates new http2 session, which now established using correct TLS cert and everything begins to work as expected.

So by switching off http2 via that GODEBUG env variable we force exporter to establish new http session for every scrap, thus work around this issue.

Probably solution would be to add some control to disable http2 when scrapping HA endpoint in config file, so user can control it, especially combined with #208, so you still can use http2 for scrapping metrics from individual nodes, ond only disabling http2 for HA endpoints.

p-v-a avatar Jul 24 '23 02:07 p-v-a

Just to add more details about this issue. It seems it related to how fortigate handles HTTPS session pickup. But by the look of it failure mode is a following:

  • Secondary unit picks up TCP session, but not HTTPS (our boxes have different TLS certs, so secondary box don't have certificate of the primary and vice versa), I don't really experimented much with certificates though, so might not be the root cause, nevertheless it feels like it's something TLS related.
  • This cause Fortigate to ignore all incoming packets from exporter
  • On the other hand, exporter is using http2 persistent connection, which lead to it trying to reuse http connection if this is available
  • Because Fortigate never reply with TCP RST, rather just ignoring packets, exporter keeps getting timeouts until http2 timeout expires
  • exporter initiates new http2 session, which now established using correct TLS cert and everything begins to work as expected.

So by switching off http2 via that GODEBUG env variable we force exporter to establish new http session for every scrap, thus work around this issue.

Probably solution would be to add some control to disable http2 when scrapping HA endpoint in config file, so user can control it, especially combined with #208, so you still can use http2 for scrapping metrics from individual nodes, ond only disabling http2 for HA endpoints.

Did you set it in the systemd unit?, mine looks like this Environment="GODEBUG=http2client=0" I've tried this and i still have a timeout of about 6-8 minutes. This is running 7.0.12

lazyb0nes avatar Oct 30 '23 20:10 lazyb0nes

Did you set it in the systemd unit?, mine looks like this Environment="GODEBUG=http2client=0" I've tried this and i still have a timeout of about 6-8 minutes. This is running 7.0.12

Sorry, for the late answer. In my case I'm running it inside kubernetes, so no systemd, just pod with certain environment variables set:

          env:
            # Workaround issue https://github.com/bluecmd/fortigate_exporter/issues/220
            - name: "GODEBUG"
              value: "http2client=0"

p-v-a avatar Dec 13 '23 02:12 p-v-a