xk6-output-prometheus-remote icon indicating copy to clipboard operation
xk6-output-prometheus-remote copied to clipboard

Multiple remote write endpoints

Open raxod502-plaid opened this issue 3 years ago • 3 comments

We have a replicated (HA) Prometheus setup in our cluster, so when using remote write to push metrics into Prometheus, we have to push to all replicas in parallel—otherwise, only one replica will have the pushed metrics, and you have a random chance of seeing them or not depending on which Prometheus you happen to be routed to when querying.

For this reason Prometheus remote write typically allows you to specify a list of endpoints to push to (see upstream docs where remote_write is an array). At the moment it seems xk6-output-prometheus-remote only supports a single remote write endpoint (from K6_PROMETHEUS_REMOTE_URL). Could we have some way of submitting a list instead?

raxod502-plaid avatar Feb 17 '22 19:02 raxod502-plaid

Hi @raxod502-plaid Thanks for opening the issue! It is definitely an interesting case but I wonder, have you considered using Cortex? I haven't tried it myself but it might be able to solve this issue on the setup level.

This could use some additional thinking but my first thought is that it's probably better to solve this use case at the setup level, if not with Cortex than with something else. The extension creates a remote client with a URL and regularly makes an HTTP request to it. Extending it to several clients and several requests will have a performance impact that the extension probably can't afford, given the rate of metrics generation.

yorugac avatar Feb 18 '22 10:02 yorugac

Well, switching our entire monitoring architecture to a different system isn't really workable in our use case; we already use HA Prometheus in all our clusters, with Thanos for aggregation and long-term storage, and changing it would be a year-long project.

Having the extension make an HTTP request every 1 second, as by default, seems like a bit much to me. I've turned that down to once every 30 seconds, because our engineers already expect a latency of around 30 seconds with their metrics as that's what we've set most of our scrape intervals to. With that adjustment, it seems like making 3 http requests instead of 1 should be fine, no?

The workaround I'm investigating at the moment is if I can run some standard network tool as a sidecar on the k6 container that will receive incoming http requests from the extension, and multiplex them to our individual Prometheus endpoints.

raxod502-plaid avatar Feb 18 '22 16:02 raxod502-plaid

I solved this problem by adding an nginx sidecar to k6 with ngx_http_mirror_module enabled, like this:

nginx.conf
events {}
http {
  client_max_body_size 8m;
  upstream prometheus-0 {
    server prometheus-prometheus-operator-prometheus-0.prometheus-operated.monitoring.svc.cluster.local:9090;
  }
  upstream prometheus-1 {
    server prometheus-prometheus-operator-prometheus-1.prometheus-operated.monitoring.svc.cluster.local:9090;
  }
  upstream prometheus-2 {
    server prometheus-prometheus-operator-prometheus-2.prometheus-operated.monitoring.svc.cluster.local:9090;
  }

  server {
    listen 80;
    server_name _;

    location / {
      mirror /next1;
      mirror /next2;
      proxy_pass http://prometheus-0$request_uri;
    }
    location /next1 {
      internal;
      proxy_pass http://prometheus-1$request_uri;
    }
    location /next2 {
      internal;
      proxy_pass http://prometheus-2$request_uri;
    }
  }
}

It would nonetheless be quite preferable to have the functionality available natively in k6.

raxod502-plaid avatar Mar 03 '22 23:03 raxod502-plaid

Currently, this is not planned on our short-term roadmap. It doesn't seem optimal to delegate this functionality to k6, as it would introduce additional complexity on error handling. Out there, there are a bunch of collectors properly coded, making this feature a core part of their business.

We might reconsider it in the future if the demand will increase.

codebien avatar Jan 27 '25 14:01 codebien