robusta
robusta copied to clipboard
Action Exception ALERT_MANAGER_DISCOVERY_FAILED while processing get_silences
Describe the bug
I have a bunch of K8s clusters where I install Robusta.
I have one Grafana v9.3.6 installed only in one cluster. This Grafana is exposed to external network with some name https://<external_url>/grafana
Robusta is unable to connect to Grafana using this grafana_url
with the following error
2023-03-15 00:32:09.407 ERROR Failed to find grafana url
2023-03-15 00:32:09.407 ERROR Action Exception ALERT_MANAGER_DISCOVERY_FAILED while processing get_silences cluster_name: 1in***uw1, account_id: 12a***bb5, alertmanager_flavor: gra***ana, grafana_api_key: eyJ***Q==, grafana_dashboard_uid: , grafana_url: htt***ana, prometheus_url: htt***eus, signing_key: 9c0***fd0
At the same time I'm able to reach this grafana_url
from the robusta-runner pod using curl request:
curl -v https://<external_url>/grafana/api/alerts -H 'authorization: Bearer <grafana_token>'
...
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
* CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
...
* SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0xaaaad1d57030)
> GET /grafana/api/alerts HTTP/2
...
> user-agent: curl/7.74.0
> accept: */*
> authorization: Bearer <grafana_token>
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* Connection state changed (MAX_CONCURRENT_STREAMS == 2147483647)!
< HTTP/2 200
< cache-control: no-cache
< content-type: application/json
< expires: -1
< pragma: no-cache
< x-content-type-options: nosniff
< x-frame-options: deny
< x-xss-protection: 1; mode=block
< date: Wed, 15 Mar 2023 01:13:07 GMT
< content-length: 886
< x-envoy-upstream-service-time: 26
< server: istio-envoy
< vary: Accept-Encoding
<
* Connection #0 to host <external_url> left intact
[{"id":4,"dashboardId":71,"dashboardUid":"AyWQt9jWk","dashboardSlug":"mongodb-test","panelId":16,"name":"MongoDB Oplog lag alert","state":"unknown","newStateDate":"2022-08-24T12:37:04Z","evalDate":"0001-01-01T00:00:00Z","evalData":{},"executionError":"","url":"/d/AyWQt9jWk/mongodb-test"},{"id":5,"dashboardId":71,"dashboardUid":"AyWQt9jWk","dashboardSlug":"mongodb-test","panelId":17,"name":"MongoDB's Disk IO Utilization alert","state":"unknown","newStateDate":"2022-08-24T12:37:04Z","evalDate":"0001-01-01T00:00:00Z","evalData":{},"executionError":"","url":"/d/AyWQt9jWk/mongodb-test"},{"id":3,"dashboardId":46,"dashboardUid":"vg4QL267z","dashboardSlug":"cex-collector","panelId":20,"name":"PMM CEX COLLECTOR STATUS alert","state":"unknown","newStateDate":"2022-07-12T09:02:01Z","evalDate":"0001-01-01T00:00:00Z","evalData":{},"executionError":"","url":"/d/vg4QL267z/cex-collector"}]
To Reproduce Steps to reproduce the behavior: I'm using following configuration:
globalConfig:
grafana_url: "https://<external_url>/grafana"
grafana_api_key: <grafana_token>
alertmanager_flavor: grafana
Expected behavior Robusta are able to reach Grafana and manage alerts
Hey, @gofrolist !
Thanks for your issue. I think robusta has two parameters for grafana_url
and for alertmanager_url
. To fix your issue, add additional row in a config with alertmanager_url: "https://<external_url>/grafana"
But in any case I think this is confusing and we need to use grafana_url
for alertmanager if it is present.
Please, let me know if it helps.
Thanks. alertmanager_url
fixed an issue.