robusta icon indicating copy to clipboard operation
robusta copied to clipboard

Runner can't connect to external prometheus and alertmanager

Open smoug25 opened this issue 1 year ago • 15 comments

Describe the bug I have multicluster setup with separate monitoring cluster. For metrics querying i use Thanos Query and it works fine in-cluster robusta runner can connect through thanos query service and alertmanager throurn alertmanager service. I expose hosts for Thanos Query and Alertmanager with JWT auth through Ambassador Edge Stack. I able to request thanos and alert manager from my machine successfully but robusta runner return errors for thanos query 401 code for Alertmanager 400 code.

To Reproduce Steps to reproduce the behavior:

  1. Setup two clusters cluster A and cluster B
  2. Expose prometheus and alert manager on cluster A with JWT authorization
  3. Install robusta to clusters and add to robusta on cluster B url to prometheus and aletmanager in cluster A
  4. See error in rubusta on cluster B

Expected behavior No errors in robusta logs on external cluster and available app metrics in robusta UI

Robusta runner logs

[31m2023-09-23 06:02:39.386 ERROR Failed to connect to prometheus. Couldn't connect to Prometheus found under https://thanos-query.areon.io Caused by HTTPError: 401 Client Error: Unauthorized for url: https://my-prometheus.url/api/v1/query?cluster=dev) Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/prometrix/connect/custom_connect.py", line 101, in check_prometheus_connection response.raise_for_status() File "/usr/local/lib/python3.9/site-packages/requests/models.py", line 943, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://my-prometheus.url/api/v1/query?cluster=dev

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/app/src/robusta/core/sinks/robusta/prometheus_health_checker.py", line 61, in prometheus_connection_checks prometheus_connection.check_prometheus_connection(params={}) File "/usr/local/lib/python3.9/site-packages/prometrix/connect/custom_connect.py", line 103, in check_prometheus_connection raise PrometheusNotFound( prometrix.exceptions.PrometheusNotFound: Couldn't connect to Prometheus found under https://my-prometheus.url Caused by HTTPError: 401 Client Error: Unauthorized for url: https://my-prometheus.url/api/v1/query?cluster=dev)

Caused by HTTPError: 400 Client Error: Bad Request for url: https://my-alertmanager.url/api/v2/silences) Traceback (most recent call last): File "/app/src/robusta/utils/silence_utils.py", line 113, in get_alertmanager_silences_connection response.raise_for_status() File "/usr/local/lib/python3.9/site-packages/requests/models.py", line 943, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://my-alertmanager.url/api/v2/silences

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/app/src/robusta/core/sinks/robusta/prometheus_health_checker.py", line 97, in alertmanager_connection_checks get_alertmanager_silences_connection(params=base_silence_params) File "/app/src/robusta/utils/silence_utils.py", line 116, in get_alertmanager_silences_connection raise AlertsManagerNotFound( robusta.core.exceptions.AlertsManagerNotFound: Could not connect to the alert manager [https://alertmanager.areon.io] Caused by HTTPError: 400 Client Error: Bad Request for url: https://my-alertmanager.url/api/v2/silences)[0m

smoug25 avatar Sep 23 '23 08:09 smoug25

Hi 👋, thanks for opening an issue! Please note, it may take some time for us to respond, but we'll get back to you as soon as we can!

  • 💬 Slack Community: Join Robusta team and other contributors on Slack here.
  • 📖 Docs: Find our documentation here.
  • 🎥 YouTube Channel: Watch our videos here).

github-actions[bot] avatar Sep 23 '23 08:09 github-actions[bot]

Hi @smoug25, I don't think we currently support JWT authorization in prometheus but we do support adding a custom prometheus authorization headers in robusta.

https://docs.robusta.dev/master/configuration/alertmanager-integration/outofcluster-prometheus.html#authentication-headers

Does something like this help?

Avi-Robusta avatar Sep 24 '23 08:09 Avi-Robusta

Hi @Avi-Robusta Thanks for reply. No, it doesn't help unfortunately, I have use JWT as bearer token already. This is my robusta helm values file

robusta:
  clusterName: dev
  enablePrometheusStack: false
  disableCloudRouting: false
  globalConfig:
    alertmanager_url: "https://(my-alertmanager.url)"
    grafana_url: ""
    prometheus_url: "https://(my-prometheus.url)"
    chat_gpt_token: "{{ env.CHAT_GPT_TOKEN }}"

    prometheus_additional_labels:
      cluster: dev
    
    signing_key: "{{ env.ROBUSTA_GLOBAL_SIGNING_KEY }}"
    account_id: "{{ env.ROBUSTA_GLOBAL_ACCOUNT_ID }}"

    prometheus_auth: "Bearer {{ env.JWT_TOKEN }}"
    alertmanager_auth: "Bearer {{ env.JWT_TOKEN }}"
    prometheus_url_query_string: "cluster=dev"
  sinksConfig:
    - discord_sink:
        name: areon_discord_sink
        url: "{{ env.DISCORD_WEBHOOK }}"
    - robusta_sink:
        name: robusta_ui_sink
        token: "{{ env.ROBUSTA_TOKEN }}"
  enablePlatformPlaybooks: true
  runner:
    additional_env_vars:
    - name: GRAFANA_KEY
      valueFrom:
        secretKeyRef:
          name: robusta-secrets
          key: grafana_key
    - name: DISCORD_WEBHOOK
      valueFrom:
        secretKeyRef:
          name: robusta-secrets
          key: discord_webhook
    - name: ROBUSTA_TOKEN
      valueFrom:
        secretKeyRef:
          name: robusta-secrets
          key: robusta_token
    - name: ROBUSTA_GLOBAL_SIGNING_KEY
      valueFrom:
        secretKeyRef:
          name: robusta-secrets
          key: robusta_global_signing_key
    - name: ROBUSTA_GLOBAL_ACCOUNT_ID
      valueFrom:
        secretKeyRef:
          name: robusta-secrets
          key: robusta_global_account_id
    - name: CHAT_GPT_TOKEN
      valueFrom:
        secretKeyRef:
          name: robusta-secrets
          key: chat_gpt_token
    - name: JWT_TOKEN
      valueFrom:
        secretKeyRef:
          name: robusta-secrets
          key: jwt_token
    - name: PROMETHEUS_SSL_ENABLED
      value: "true"                                                           
    sendAdditionalTelemetry: false
  rsa:
    private:  -- secret --
    public: -- secret --
  playbookRepos:
    chatgpt_robusta_actions:
      url: "https://github.com/robusta-dev/kubernetes-chatgpt-bot.git"

  customPlaybooks:
  - triggers:
    - on_prometheus_alert: {}
    actions:
    - chat_gpt_enricher: {}

smoug25 avatar Sep 24 '23 09:09 smoug25

Hi @smoug25 Can you try running this with your url and token to see what thanos responds?

curl --location 'https://MY-PROMETHEUS.URL/api/v1/query?query=up' \
--header 'Authorization: Bearer JWT_TOKEN'

Some users have had issues with thanos because they needed to either specify port or make the url http instead of https, so if the curl doesn't work try either or both of those

Avi-Robusta avatar Sep 24 '23 09:09 Avi-Robusta

@Avi-Robusta after sending your request to my thanos I got valid response with metrics. Let me narrow down one point. I use thanos without auth but it behind proxy with auth by JWT that expected in header 'Authorization: Bearer JWT_TOKEN'. And my alertmanager sites behind the same proxy but I get 400 code in response.

smoug25 avatar Sep 24 '23 10:09 smoug25

Hello, @Avi-Robusta. Do you have any updates with this issue?

smoug25 avatar Oct 02 '23 09:10 smoug25

Hi @smoug25 , I wasnt able to replicate the issue, Would you like to jump on a call for me to debug this with you? You can pick a time from my Calendly.

Avi-Robusta avatar Oct 02 '23 10:10 Avi-Robusta

Hi @Avi-Robusta, Do you have any ideas that we could do for better issue understanding?

smoug25 avatar Oct 17 '23 17:10 smoug25

Hi @smoug25 . Avi is currently not available. It will be easier to discuss in the Slack community in the #support channel.

Sheeproid avatar Oct 31 '23 15:10 Sheeproid

@smoug25 can you confirm if this is still happening or if it was fixed?

aantn avatar Feb 22 '24 14:02 aantn

@aantn I'v updated to 0.10.29 and problem still relevant.

smoug25 avatar Feb 22 '24 14:02 smoug25

Weird. If you run the curl command from the robusta-runner pod, does it work? I am trying to figure out what is different about the way the runner connects.

aantn avatar Feb 22 '24 17:02 aantn

If I make a Curl request from the robusta-runners Pod, it works fine and I receive a status code of 200 (OK).

smoug25 avatar Feb 23 '24 07:02 smoug25