thanos icon indicating copy to clipboard operation
thanos copied to clipboard

Receiver with Ketama Algorithm fails to start internal server when amount of endpoints needs to be larger than replication factor

Open dmilind opened this issue 1 year ago • 4 comments

Thanos, Prometheus and Golang version used: Thanos Version - 0.32.5

What happened:

  1. I use thanos receiver controller to update my hashring config depending on the statefulsets of thanos receiver. So hashring config is dynamically created.
  2. I am testing ketama algorithm for production use. During test I observed that ketama algorithm has a check on number of endpoints which must be higher than replication factor. As below - https://github.com/thanos-io/thanos/blob/main/pkg/receive/hashring.go#L139C1-L145C3
func newKetamaHashring(endpoints []Endpoint, sectionsPerNode int, replicationFactor uint64) (*ketamaHashring, error) {
	numSections := len(endpoints) * sectionsPerNode

	if len(endpoints) < int(replicationFactor) {
		return nil, errors.New("ketama: amount of endpoints needs to be larger than replication factor")

	}

Real problem starts when I use dynamic generation of hashring config. Receive controller will create a hashring config with empty endpoint list at very first. Now this file is loaded on each receiver replica. When receiver replica is initialized it checks this file and there itself condition fails and receiver service show error message. Since receiver is not in ready state receive controller will not add endpoint and hashring will not be updated. So in all next restarts of receiver same error will be thrown and situation goes into interdependency deadlock.

My receiver configuraiton:

--receive.hashrings-algorithm=ketama
--receive.replication-factor=3
--receive.hashrings-file=/var/thanos/receive/hashring/thanos-receive-hashring.json

What you expected to happen: I am not sure why number of endpoints need to be checked in ketama algorithm. If we could remove this validation then issue will be resolved.

How to reproduce it (as minimally and precisely as possible):

  1. Run observatorium/thanos-receive-controller to manage hashring config
  2. Load this config to receiver statefulsets with below configs
--receive.hashrings-algorithm=ketama
--receive.replication-factor=3
--receive.hashrings-file=/var/thanos/receive/hashring/thanos-receive-hashring.json

Full logs to relevant components:

 {"caller":"main.go:161","err":"ketama: amount of endpoints needs to be larger than replication factor\n
github.com/thanos-io/thanos/pkg/receive.newKetamaHashring\n\t/app/pkg/receive/hashring.go:129\n
github.com/thanos-io/thanos/pkg/receive.newHashring\n\t/app/pkg/receive/hashring.go:310\ngithub.com/thanos-io/thanos/pkg/receive.NewMultiHashring\n\t
/app/pkg/receive/hashring.go:288\nmain.setupHashring.func5\n\t/app/cmd/thanos/receive.go:530\ngithub.com/oklog/run.(*Group).Run.func1\n\t
/go/pkg/mod/github.com/oklog/[email protected]/group.go:38\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650\n
unable to create new hashring from config\nmain.setupHashring.func5\n\t
/app/cmd/thanos/receive.go:532\ngithub.com/oklog/run.(*Group).Run.func1\n\t/go/pkg/mod/github.com/oklog/[email protected]/group.go:38\n
runtime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650\nreceive command failed\nmain.main\n\t/app/cmd/thanos/main.go:161\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:267\n
runtime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650","level":"error","ts":"2024-01-13T00:02:04.325882767Z"}

Anything else we need to know:

dmilind avatar Jan 13 '24 00:01 dmilind

Just as a note, we need to check endpoints because otherwise we just busyloop to find enough replicas in the current implementation at least.

MichaHoffmann avatar Jan 13 '24 21:01 MichaHoffmann

Consider the scenario under k8s, First replica of receiver is coming up and during initialization only receiver making checks with replication factor lets say 1. At this stage pod is not in ready state so endpoint is not yet updated by receiver controller and hashring config has null endpoints. Now receiver will error out and goes into continuous crashloop back.

How would we resolve this conflict ? I am not understanding how replication factor check against endpoint helps and what scenario it helps ? If you could help me to understand please,

dmilind avatar Jan 15 '24 20:01 dmilind

Any thought on above ?

dmilind avatar Jan 17 '24 21:01 dmilind

I made the same observation when I first setup Thanos Receive.

Most of my issues where resolved when I switched to the Router + Ingestor architecture that's defined here.

The downside is that you kinda double the number of Thanos Receive (and costs associated to that).

The upsides are many:

  • You decouple the hashring/replication and the multi-tenant writes
  • The "ingestor" can scale up independently, and the Thanos Receive Controller can update the hashring file based on those and the "router" will route to them when they are ready.
  • Stuff like node pool upgrade, rollout updates don't break the quorum

I suggest you take a look at kube-thanos for an example.

If you use the Bitnami Helm Chart, you can enable the Thanos Receive Router via receiveDistributor.enabled.

An example of values using the Helm chart (the Thanos Receive Controller needs to be installed separately):

receive:
  enabled: true
  mode: dual-mode
  existingConfigmap: thanos-receive-controller-tenants-generated
  replicaLabel: receive_replica
  updateStrategy:
    type: RollingUpdate
  minReadySeconds: 60
  podSecurityContext:
    fsGroupChangePolicy: OnRootMismatch
  service:
    type: LoadBalancer
    annotations:
      external-dns.alpha.kubernetes.io/hostname: thanos-receive.example.com
    additionalHeadless: true
  statefulsetLabels:
    controller.receive.thanos.io: thanos-receive-controller
    controller.receive.thanos.io/hashring: default
  pdb:
    create: true
    minAvailable: ""
    maxUnavailable: 1
receiveDistributor:
  enabled: true
  replicationFactor: 2
  extraEnvVars:
    - name: GODEBUG
      value: gcshrinkstackoff=1
  extraFlags:
    - --receive.hashrings-algorithm=ketama
    - --receive.hashrings-file-refresh-interval=1m
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
  pdb:
    create: true
    minAvailable: ""
    maxUnavailable: 1
query:
  replicaLabel:
    - receive_replica

Matroxt avatar Feb 08 '24 02:02 Matroxt

I made the same observation when I first setup Thanos Receive.

Most of my issues where resolved when I switched to the Router + Ingestor architecture that's defined here.

The downside is that you kinda double the number of Thanos Receive (and costs associated to that).

The upsides are many:

  • You decouple the hashring/replication and the multi-tenant writes
  • The "ingestor" can scale up independently, and the Thanos Receive Controller can update the hashring file based on those and the "router" will route to them when they are ready.
  • Stuff like node pool upgrade, rollout updates don't break the quorum

I suggest you take a look at kube-thanos for an example.

If you use the Bitnami Helm Chart, you can enable the Thanos Receive Router via receiveDistributor.enabled.

An example of values using the Helm chart (the Thanos Receive Controller needs to be installed separately):

receive:
  enabled: true
  mode: dual-mode
  existingConfigmap: thanos-receive-controller-tenants-generated
  replicaLabel: receive_replica
  updateStrategy:
    type: RollingUpdate
  minReadySeconds: 60
  podSecurityContext:
    fsGroupChangePolicy: OnRootMismatch
  service:
    type: LoadBalancer
    annotations:
      external-dns.alpha.kubernetes.io/hostname: thanos-receive.example.com
    additionalHeadless: true
  statefulsetLabels:
    controller.receive.thanos.io: thanos-receive-controller
    controller.receive.thanos.io/hashring: default
  pdb:
    create: true
    minAvailable: ""
    maxUnavailable: 1
receiveDistributor:
  enabled: true
  replicationFactor: 2
  extraEnvVars:
    - name: GODEBUG
      value: gcshrinkstackoff=1
  extraFlags:
    - --receive.hashrings-algorithm=ketama
    - --receive.hashrings-file-refresh-interval=1m
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
  pdb:
    create: true
    minAvailable: ""
    maxUnavailable: 1
query:
  replicaLabel:
    - receive_replica

Interesting.

But how does it work with replicationFactor set to 2 and receive/distributor replicas set to 1?

I'm trying to configure my setup with autoscaling enabled and it's been quite difficult... No success so far.

kaiohenricunha avatar Feb 29 '24 00:02 kaiohenricunha