thanos
thanos copied to clipboard
Load balancing for Store Gateways
Is your proposal related to a problem?
It currently makes little sense to run store gateways in HA mode since Queriers do a fanout across all stores. In order to enable seamless updates and allow spreading spreading load more evenly, we should consider implementing load balancing capabilities in Queriers.
Describe the solution you'd like
Ideally, load-balancing should be done on the client-side to mitigate the stickiness of gRPC connections. We can identify replicated stores by their labels when selecting series and instead of fanning out across all stores, we can use round-robin or pick replicas at random.
One potential solution would be to introduce a new object that implements the store.Client interface and holds references to stores with the same labels. This object would be returned by and will randomly proxy requests to one of the downstream stores.
https://github.com/thanos-io/thanos/blob/3de8ee7245967c6660beab1e0ebf5015b02137b3/pkg/query/endpointset.go#L364-L375
Describe alternatives you've considered
This was already discussed in https://github.com/thanos-io/thanos/issues/199 and was closed with a suggestion to use a load balancer or rely on kube-proxy in Kubernetes. The suggested approach will not work due to the sticky nature of gRPC connections.
We do have to consider the fact that we don't want this for sidecars. As data can be different per Prometheus instance. Currently, the most efficient way (imo) for load spread is to implement sharding. I would be curious what 'overhead' we would have with load balancing over multiple stores. Anyhow +1 from me. Might even be a cool LFX project as well.
Yup, but we can treat that separately.
I think it's necessary work. The only question I have:
- Should we do it in endpointset?
- Or rather in gRPC natively using https://rafaeleyng.github.io/grpc-load-balancing-with-grpc-go DNS
Also we need this for Querier
Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.
Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.
Still relevant.