loki icon indicating copy to clipboard operation
loki copied to clipboard

Add ability to for Ruler to remote write to multiple (clustered) Prometheus servers

Open cten opened this issue 3 years ago • 7 comments

Is your feature request related to a problem? Please describe. We run multiple Prometheus servers for HA, we need to be able to remote-write to all servers so that if one server crashes we don't loose the metrics from loki.

Describe the solution you'd like Ability to point at more than 1 remote Prometheus, possible allow Prometheus to scrape loki, and if possible allow discovery of Prometheus servers using Service Discovery of K8s.

Describe alternatives you've considered I have tried to mirror the traffic from loki to prometheus but this is unreliable and hard to setup with current tools

Additional context #4241

cten avatar Sep 22 '21 21:09 cten

Hi! This issue has been automatically marked as stale because it has not had any activity in the past 30 days.

We use a stalebot among other tools to help manage the state of issues in this project. A stalebot can be very useful in closing issues in a number of cases; the most common is closing issues or PRs where the original reporter has not responded.

Stalebots are also emotionless and cruel and can close issues which are still very relevant.

If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.

We regularly sort for closed issues which have a stale label sorted by thumbs up.

We may also:

  • Mark issues as revivable if we think it's a valid issue but isn't something we are likely to prioritize in the future (the issue will still remain closed).
  • Add a keepalive label to silence the stalebot if the issue is very common/popular/important.

We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task, our sincere apologies if you find yourself at the mercy of the stalebot.

stale[bot] avatar Mar 03 '22 03:03 stale[bot]

Still very important, or can Loki serve these metrics so that Prometheus can scrape them?

cten avatar May 10 '22 13:05 cten

Can we reopen please

chrism417 avatar Jul 18 '22 19:07 chrism417

Reopened, and I agree it's useful.

There really is no reason for the remote write configs to be limited to a single one, other than backwards-compatibility. The code should support multiple configurations IIRC.

This should be a relatively easy one for someone to pick up, so I'll leave it (and I don't have time :blush:). The tricky bit will be handling the config in a backwards-compatible way.

dannykopping avatar Jul 18 '22 19:07 dannykopping

Hello @dannykopping I'm taking a look at this issue, and I was wondering what would be the best approach to handle backward compatibility. One solution I see is to add a new config entry clients that would basically be a list composed of the existing Client config type. Then have some sort of rule when both client and clients are defined in the remote_write config (something like clients config overrides client when both are present). So WDYT?

aminesnow avatar Jul 26 '22 16:07 aminesnow

That sounds exactly the way I was planning to implement it in the beginning before i ran out of time :+1: Ping me on public Slack if you need anything!

dannykopping avatar Jul 26 '22 16:07 dannykopping

@dannykopping Awesome! Thanks for the speedy reply, I'll start working on it right away ;)

aminesnow avatar Jul 26 '22 16:07 aminesnow

@dannykopping Out of curiosity, why was remote write chosen over being scraped? Is there a document?

snuggie12 avatar Sep 28 '23 22:09 snuggie12

We considered it, but the problem with scraping is you can only return the latest sample. If your scrape internal was 15s but your rule produced samples every 5s, you'd be missing 2/3 of the data (or looked at another way: you would have wasted the CPU cycles for 2 out of 3 rule evaluations every time).

dannykopping avatar Sep 29 '23 05:09 dannykopping

@dannykopping Gotcha, that makes sense. Thanks for the explanation. Just got it up and running and looking forward to having some stats!

snuggie12 avatar Sep 29 '23 16:09 snuggie12