Steve Simpson comments

Results 17 comments of


                                            Steve Simpson

Adjust rate-limit usage on partial ingestion failure

A simple and minor improvement would be to only roll-back if **all** the samples fail to ingest. This would still help with the original issue of ingesters being unavailable, but...

Adjust rate-limit usage on partial ingestion failure

Good question, scratch that...

"the Alertmanager has no configuration and no fallback specified", even with X-Scope-OrgID

The problem here is exactly the issue you linked to, unfortunately it's not the best user experience if you configure Grafana before setting an Alertmanager configuration (which is normal if...

Auto-forget unhealthy compactors

> I got surprised by this. If a compactor is unhealthy in the ring, I was expecting its jobs to be resharded to other compactor replicas. Could you double check...

Retrying AlertManager UnaryPath Requests on next replica if one fail

Thanks for working on this. I can add some context: The problem with creating silences is that the operation is not idempotent in [upstream] alertmanager. Each API call creates a...

Retrying AlertManager UnaryPath Requests on next replica if one fail

My over-arching point/question was whether we should aim to use quorum operations for everything, so the distributor is logically as transparent as possible (modulo result merging). > About the delete...

feat(alertmanager): support loki alerts GeneratorURL in template functions

Hi @fgouteroux - thanks for the PR. I'm taking a look now.

feat(alertmanager): support loki alerts GeneratorURL in template functions

If I understand right, what we want to accomplish is if the system (e.g. Loki) sending alerts already provide a Grafana Explore URL, then we want to pass that through...

Random ruler delivery errors to Alertmanager

Hi @yafanasiev - would you be able to provide a snippet of your logs from the alertmanager instances? That would be very helpful.

Random ruler delivery errors to Alertmanager

Interesting. I noticed we're using the headless service in the Helm chart, shouldn't make any difference, but we should be using the standard service to load balance over alertmanager replicas....