thanos
thanos copied to clipboard
Receiver with Ketama Algorithm fails to start internal server when amount of endpoints needs to be larger than replication factor
Thanos, Prometheus and Golang version used: Thanos Version - 0.32.5
What happened:
- I use thanos receiver controller to update my hashring config depending on the statefulsets of thanos receiver. So hashring config is dynamically created.
- I am testing ketama algorithm for production use. During test I observed that ketama algorithm has a check on number of endpoints which must be higher than replication factor. As below - https://github.com/thanos-io/thanos/blob/main/pkg/receive/hashring.go#L139C1-L145C3
func newKetamaHashring(endpoints []Endpoint, sectionsPerNode int, replicationFactor uint64) (*ketamaHashring, error) {
numSections := len(endpoints) * sectionsPerNode
if len(endpoints) < int(replicationFactor) {
return nil, errors.New("ketama: amount of endpoints needs to be larger than replication factor")
}
Real problem starts when I use dynamic generation of hashring config. Receive controller will create a hashring config with empty endpoint list at very first. Now this file is loaded on each receiver replica. When receiver replica is initialized it checks this file and there itself condition fails and receiver service show error message. Since receiver is not in ready state receive controller will not add endpoint and hashring will not be updated. So in all next restarts of receiver same error will be thrown and situation goes into interdependency deadlock.
My receiver configuraiton:
--receive.hashrings-algorithm=ketama
--receive.replication-factor=3
--receive.hashrings-file=/var/thanos/receive/hashring/thanos-receive-hashring.json
What you expected to happen: I am not sure why number of endpoints need to be checked in ketama algorithm. If we could remove this validation then issue will be resolved.
How to reproduce it (as minimally and precisely as possible):
- Run observatorium/thanos-receive-controller to manage hashring config
- Load this config to receiver statefulsets with below configs
--receive.hashrings-algorithm=ketama
--receive.replication-factor=3
--receive.hashrings-file=/var/thanos/receive/hashring/thanos-receive-hashring.json
Full logs to relevant components:
{"caller":"main.go:161","err":"ketama: amount of endpoints needs to be larger than replication factor\n
github.com/thanos-io/thanos/pkg/receive.newKetamaHashring\n\t/app/pkg/receive/hashring.go:129\n
github.com/thanos-io/thanos/pkg/receive.newHashring\n\t/app/pkg/receive/hashring.go:310\ngithub.com/thanos-io/thanos/pkg/receive.NewMultiHashring\n\t
/app/pkg/receive/hashring.go:288\nmain.setupHashring.func5\n\t/app/cmd/thanos/receive.go:530\ngithub.com/oklog/run.(*Group).Run.func1\n\t
/go/pkg/mod/github.com/oklog/[email protected]/group.go:38\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650\n
unable to create new hashring from config\nmain.setupHashring.func5\n\t
/app/cmd/thanos/receive.go:532\ngithub.com/oklog/run.(*Group).Run.func1\n\t/go/pkg/mod/github.com/oklog/[email protected]/group.go:38\n
runtime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650\nreceive command failed\nmain.main\n\t/app/cmd/thanos/main.go:161\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:267\n
runtime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650","level":"error","ts":"2024-01-13T00:02:04.325882767Z"}
Anything else we need to know:
Just as a note, we need to check endpoints because otherwise we just busyloop to find enough replicas in the current implementation at least.
Consider the scenario under k8s, First replica of receiver is coming up and during initialization only receiver making checks with replication factor lets say 1. At this stage pod is not in ready state so endpoint is not yet updated by receiver controller and hashring config has null endpoints. Now receiver will error out and goes into continuous crashloop back.
How would we resolve this conflict ? I am not understanding how replication factor check against endpoint helps and what scenario it helps ? If you could help me to understand please,
Any thought on above ?
I made the same observation when I first setup Thanos Receive.
Most of my issues where resolved when I switched to the Router + Ingestor architecture that's defined here.
The downside is that you kinda double the number of Thanos Receive (and costs associated to that).
The upsides are many:
- You decouple the hashring/replication and the multi-tenant writes
- The "ingestor" can scale up independently, and the Thanos Receive Controller can update the hashring file based on those and the "router" will route to them when they are ready.
- Stuff like node pool upgrade, rollout updates don't break the quorum
I suggest you take a look at kube-thanos for an example.
If you use the Bitnami Helm Chart, you can enable the Thanos Receive Router via receiveDistributor.enabled
.
An example of values using the Helm chart (the Thanos Receive Controller needs to be installed separately):
receive:
enabled: true
mode: dual-mode
existingConfigmap: thanos-receive-controller-tenants-generated
replicaLabel: receive_replica
updateStrategy:
type: RollingUpdate
minReadySeconds: 60
podSecurityContext:
fsGroupChangePolicy: OnRootMismatch
service:
type: LoadBalancer
annotations:
external-dns.alpha.kubernetes.io/hostname: thanos-receive.example.com
additionalHeadless: true
statefulsetLabels:
controller.receive.thanos.io: thanos-receive-controller
controller.receive.thanos.io/hashring: default
pdb:
create: true
minAvailable: ""
maxUnavailable: 1
receiveDistributor:
enabled: true
replicationFactor: 2
extraEnvVars:
- name: GODEBUG
value: gcshrinkstackoff=1
extraFlags:
- --receive.hashrings-algorithm=ketama
- --receive.hashrings-file-refresh-interval=1m
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
pdb:
create: true
minAvailable: ""
maxUnavailable: 1
query:
replicaLabel:
- receive_replica
I made the same observation when I first setup Thanos Receive.
Most of my issues where resolved when I switched to the Router + Ingestor architecture that's defined here.
The downside is that you kinda double the number of Thanos Receive (and costs associated to that).
The upsides are many:
- You decouple the hashring/replication and the multi-tenant writes
- The "ingestor" can scale up independently, and the Thanos Receive Controller can update the hashring file based on those and the "router" will route to them when they are ready.
- Stuff like node pool upgrade, rollout updates don't break the quorum
I suggest you take a look at kube-thanos for an example.
If you use the Bitnami Helm Chart, you can enable the Thanos Receive Router via
receiveDistributor.enabled
.An example of values using the Helm chart (the Thanos Receive Controller needs to be installed separately):
receive: enabled: true mode: dual-mode existingConfigmap: thanos-receive-controller-tenants-generated replicaLabel: receive_replica updateStrategy: type: RollingUpdate minReadySeconds: 60 podSecurityContext: fsGroupChangePolicy: OnRootMismatch service: type: LoadBalancer annotations: external-dns.alpha.kubernetes.io/hostname: thanos-receive.example.com additionalHeadless: true statefulsetLabels: controller.receive.thanos.io: thanos-receive-controller controller.receive.thanos.io/hashring: default pdb: create: true minAvailable: "" maxUnavailable: 1 receiveDistributor: enabled: true replicationFactor: 2 extraEnvVars: - name: GODEBUG value: gcshrinkstackoff=1 extraFlags: - --receive.hashrings-algorithm=ketama - --receive.hashrings-file-refresh-interval=1m updateStrategy: type: RollingUpdate rollingUpdate: maxSurge: 0 maxUnavailable: 1 pdb: create: true minAvailable: "" maxUnavailable: 1 query: replicaLabel: - receive_replica
Interesting.
But how does it work with replicationFactor set to 2 and receive/distributor replicas set to 1?
I'm trying to configure my setup with autoscaling enabled and it's been quite difficult... No success so far.