operator icon indicating copy to clipboard operation
operator copied to clipboard

vmagent: update sharding implementation

Open f41gh7 opened this issue 2 years ago • 5 comments

Currently, operator creates deployment/statefulset per shard. It introduces additional overhead for managing multiple entities. It's a legacy behavior.

Atm vmagent allows to read shard information from statefulset pod name (pod-0, pod-1,pod-2). And we have to use this approach for agent sharding.

How it would work:

spec.shardCount - configures a number of shards - statefulset.replicaCount=spec.shardCount, promscrape.cluster.membersCount=spec.shardCount.

spec.replicaCount - configures number of replicas for given shard - promscrape.cluster.replicationFactor=spec.replicaCount.

each pod get param promscrape.cluster.memberNum=pod-name-{0}.

f41gh7 avatar Mar 07 '23 11:03 f41gh7

There are three way to implement vmagent sharding under statefulset mode:

  1. the current way: for each part of the shard, we have an independent deployment/statefulset with according -promscrape.cluster.membersCount arg. Thus vmagent.replicaCount -> statefulset replicas, vmagent.shardCount -> statefulset num, and vmagent promscrape.cluster.replicationFactor is not involved[can be configured by vmagent. extraArgs].
  2. proposed in this issue, create only one statefulset for each vmagent CR, vmagent.shardCount -> statefulset replicas, vmagent.replicaCount -> promscrape.cluster.replicationFactor.
  3. similar to 2 but leave vmagent.replicaCount -> num of workload and replicationFactor still be configured by extraArgs. thus vmagent.shardCount -> statefulset replicas, vmagent.replicaCount -> statefulset num[reverse to p1]. p3 is proposed because with p2, we can't have multiple instances for each shard member with one vmagent CR anymore, this could be a breaking change for current users and could be requested by user.

For example, we have vmagent.replicaCount = 2, vmagent.shardCount = 3 Those three way will be like image

Wth p2&p3, there are two more things need to be considered:

  1. podAffinity with p1, we gain ability to schedule pods with the same shard num deploying on the different node. But with p2&p3, each statefulset is a completed shard cluster, it will be hard to achieve that.
  2. update without data losing If users want to upgrade vmagent without data missing, they can set spec.replicas>1. With p1, data won't be missed cause there is more than one instance for each part of shard and they can do rolling update. With p3, it will be uncontrollable.

And p2&p3 will only be implemented under statefulset mode, deployment mode will keep using p1 cause random pod name can't be used as cluster.memberNum, that leads to inconsistence between two modes. So I think the current way is actually more functional than the other two, and having the very same configuration for each replica under one workload maybe more understandable. wdyt) @f41gh7 @Amper @hagen1778 plz cc @valyala

Haleygo avatar Sep 18 '23 04:09 Haleygo

@f41gh7 @tenmozes @k1rk can we have your opinion here?

hagen1778 avatar Sep 22 '23 13:09 hagen1778

Possible cases, when affinity and advanced shard scheduling makes sense :

  1. shard-replica must be at different zones. Supported by (p1,p3).
  2. each shard must be at different host machine (no such requirements for replicas). Supported by (p1,p2,p3).
  3. any other use cases?

A possible advantage of using P2 - upgrades without data loss with replicas==1. But it works only with deployments and I believe it's not a production ready case.

So, I suggest to change current implementation. For case, when replicas must be at different zones user must create multiple VMAgent installation.

Additional note, if shard configuration is defined only StatefulSet must be used. Deployments will be no longer supported.

f41gh7 avatar Sep 26 '23 10:09 f41gh7

Lets implement p2.

proposed in this issue, create only one statefulset for each vmagent CR, vmagent.shardCount -> statefulset replicas, vmagent.replicaCount -> promscrape.cluster.replicationFactor.

It's a breaking change and we're fine with it. We have to mention it at our changelog.

Main motivation - use vmagent native mechanism for sharding (promscrape.cluster.replicationFactor isn't in use currently).

With enabled sharding ONLY Statefulset is supported ( statefulMode enabled implicitly).

FYI, @hagen1778 @Haleygo

f41gh7 avatar Dec 01 '23 13:12 f41gh7

Dropping p1 result in you can't ensure any more, that the replicas are not in same zone. With p2 not doable. With p2 the link in memberURLTemplate will get brocken, when one replica is down. Do you plan to produce propper svc`s for that?

frankconrad avatar Feb 04 '24 13:02 frankconrad