fortigate_exporter icon indicating copy to clipboard operation
fortigate_exporter copied to clipboard

Timeout when querying secondary unit in HA mode

Open p-v-a opened this issue 1 year ago • 2 comments

I have observed this while troubleshooting fortigate_exporter timeouts, that happens sporadically across our fortigate fleet. normally scraping of one device would take 2-3 seconds. however in case of secondary unit in HA cluster, same scrape takes close to 25s. sporadically spilling over 30s default timeout.

It looks like the root cause it an API call to /api/v2/monitor/system/fortimanager/status?vdom=*. on a secondary that takes 10-20sec, and occasionally even longer.

I'm not sure if this is something you want to deal with, or it's more of a fortigate issue, however I'm creating issue here to document this behaviour.

p-v-a avatar Mar 24 '23 03:03 p-v-a

+1, came here to say the same thing. Scraping devices that are not primary is causing issues with false reports of devices being down.

2023/07/04 15:48:34 Error: Get "https://fortigatehostname.my.tld/api/v2/monitor/system/fortimanager/status?vdom=*": context canceled
2023/07/04 15:48:34 Error: Get "https://fortigatehostname.my.tld/api/v2/monitor/system/ha-statistics": context canceled
2023/07/04 15:48:34 Error: Get "https://fortigatehostname.my.tld/api/v2/monitor/system/interface/select?vdom=*&include_vlan=true&include_aggregate=true": context canceled
2023/07/04 15:48:34 Error: Get "https://fortigatehostname.my.tld/api/v2/monitor/system/link-monitor?vdom=*": context canceled
2023/07/04 15:48:34 Error: Get "https://fortigatehostname.my.tld/api/v2/monitor/system/resource/usage?interval=1-min&scope=global": context canceled
2023/07/04 15:48:34 Warning: Get "https://fortigatehostname.my.tld/api/v2/monitor/system/sensor-info?vdom=root": context canceled
2023/07/04 15:48:34 Error: Get "https://fortigatehostname.my.tld/api/v2/monitor/system/status": context canceled
2023/07/04 15:48:34 Error: Get "https://fortigatehostname.my.tld/api/v2/monitor/system/resource/usage?interval=1-min&vdom=*": context canceled
2023/07/04 15:48:34 Error: Get "https://fortigatehostname.my.tld/api/v2/monitor/system/ha-checksums?scope=global": context canceled
2023/07/04 15:48:34 Probe of "https://fortigatehostname.my.tld" failed, took 30.000 seconds

aaronnad avatar Jul 04 '23 15:07 aaronnad

Yes, I ended up separating scrape jobs. one that scrape each fortigate host and includes only metrics that makes sense for individual box, like that:

      probes:
        include:
          - System/SensorInfo
          - System/Status
          - System/Time/Clock
          - System/Resource/Usage
          - License/Status
          - WebUI/State

and then second one that scrapes cluster VIP and excludes metrics above:

      probes:
        exclude:
          - System/SensorInfo
          - System/Status
          - System/Time/Clock
          - System/Resource/Usage
          - License/Status
          - WebUI/State

p-v-a avatar Jul 11 '23 00:07 p-v-a