helm-charts icon indicating copy to clipboard operation
helm-charts copied to clipboard

[tempo-distributed] Allow higher count of unavailable ingester replicas in pdb

Open AlexDCraig opened this issue 2 years ago • 4 comments

Resolves https://github.com/grafana/helm-charts/issues/1653

Happy to make this more conservative, just interested in increasing the count from 1

AlexDCraig avatar Aug 02 '22 17:08 AlexDCraig

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Aug 02 '22 17:08 CLAassistant

With RF3 the system can tolerate being down 1 ingester and still accept writes and return reads. Now, depending on how you do your rollouts you could do more than one at once, but it would potentially increase load on the remaining ingesters quite a bit.

joe-elliott avatar Aug 02 '22 18:08 joe-elliott

@joe-elliott I have a lot more than 3 ingester replicas in my ring. My understanding thus is that I can afford to lose more than one and the ring can still service read/writes.

AlexDCraig avatar Aug 02 '22 18:08 AlexDCraig

It's occurred to me now that my default setting here isn't accurate for the default values of RF and replicas :D

What I'm suggesting is that maxUnavailable should be replicas - (floor(rep. factor / 2) + 1). This covers the default case by making it maxUnavailable: 1 but also covers my situation (and probably others like me) who have more than 3 ingester replicas and can afford to lose more than one at a time.

AlexDCraig avatar Aug 02 '22 18:08 AlexDCraig