loki
loki copied to clipboard
Add ability to for Ruler to remote write to multiple (clustered) Prometheus servers
Is your feature request related to a problem? Please describe. We run multiple Prometheus servers for HA, we need to be able to remote-write to all servers so that if one server crashes we don't loose the metrics from loki.
Describe the solution you'd like Ability to point at more than 1 remote Prometheus, possible allow Prometheus to scrape loki, and if possible allow discovery of Prometheus servers using Service Discovery of K8s.
Describe alternatives you've considered I have tried to mirror the traffic from loki to prometheus but this is unreliable and hard to setup with current tools
Additional context #4241
Hi! This issue has been automatically marked as stale because it has not had any activity in the past 30 days.
We use a stalebot among other tools to help manage the state of issues in this project. A stalebot can be very useful in closing issues in a number of cases; the most common is closing issues or PRs where the original reporter has not responded.
Stalebots are also emotionless and cruel and can close issues which are still very relevant.
If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.
We regularly sort for closed issues which have a stale
label sorted by thumbs up.
We may also:
- Mark issues as
revivable
if we think it's a valid issue but isn't something we are likely to prioritize in the future (the issue will still remain closed). - Add a
keepalive
label to silence the stalebot if the issue is very common/popular/important.
We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task, our sincere apologies if you find yourself at the mercy of the stalebot.
Still very important, or can Loki serve these metrics so that Prometheus can scrape them?
Can we reopen please
Reopened, and I agree it's useful.
There really is no reason for the remote write configs to be limited to a single one, other than backwards-compatibility. The code should support multiple configurations IIRC.
This should be a relatively easy one for someone to pick up, so I'll leave it (and I don't have time :blush:). The tricky bit will be handling the config in a backwards-compatible way.
Hello @dannykopping
I'm taking a look at this issue, and I was wondering what would be the best approach to handle backward compatibility.
One solution I see is to add a new config entry clients
that would basically be a list composed of the existing Client
config type. Then have some sort of rule when both client
and clients
are defined in the remote_write
config (something like clients
config overrides client
when both are present).
So WDYT?
That sounds exactly the way I was planning to implement it in the beginning before i ran out of time :+1: Ping me on public Slack if you need anything!
@dannykopping Awesome! Thanks for the speedy reply, I'll start working on it right away ;)
@dannykopping Out of curiosity, why was remote write chosen over being scraped? Is there a document?
We considered it, but the problem with scraping is you can only return the latest sample. If your scrape internal was 15s but your rule produced samples every 5s, you'd be missing 2/3 of the data (or looked at another way: you would have wasted the CPU cycles for 2 out of 3 rule evaluations every time).
@dannykopping Gotcha, that makes sense. Thanks for the explanation. Just got it up and running and looking forward to having some stats!