multi-tenancy icon indicating copy to clipboard operation
multi-tenancy copied to clipboard

[VC] Suggestion to support multi instance of vc syncer

Open ThunderYe opened this issue 2 years ago • 10 comments

Hi all, This is Lei from Aliyun Cloud .

I have a special scenario where tenant master needs to run in tenant's VPC (Virtual Private Cloud) for some special reason, because Super Master and Tenat Master are in different VPCs, we have to create special VC-Syncer for these tenants .

And then they need to connect to their own Tenant Master and Super Master (in fact, these instances have dual network cards, but don't need to care about the details for the time being,we can implement it is our own vc-manager), I would like to ask some suggsintion from you guys , is this feature is valuable?

 Any advice is welcome!   
@Fei-Guo @zhuangqh  and all kind friends!

ThunderYe avatar Jun 13 '22 12:06 ThunderYe

This is doable as long as you implement the per tenant syncer (in addition to the centralized one in upstream). The complexities lie in the per tenant syncer lifecycle management and modifications to vc-managers. Also, I don't think upstream can support this feature since it is a major architecture change. Another thought is that maybe tenant master pluses VK is an easier alternative if the user experience difference can be tolerated.

Fei-Guo avatar Jun 13 '22 14:06 Fei-Guo

Thanks a lot ,Fei.

Maybe we can begin to design a pipeline or called-multi-instance Syncer working mode in upstream?
You see, some customer may concern the performace of stand-alone instence syncer, it can support multi vpc or just for horizontally scaling.

ThunderYe avatar Jun 14 '22 07:06 ThunderYe

There is a problem for the per tenant syncer design given the existing syncer implementation. The stand-alone syncer may have to watch the entire super cluster resources since it can create arbitrary number of namespaces in the super cluster. This can be a problem when the super cluster restarts and there are a lot of standalone syncers need to reload their informer caches. The standalone syncer seems to be more suitable for solutions like vCluster where each tenant only creates one namespace in the super cluster.

Fei-Guo avatar Jun 14 '22 08:06 Fei-Guo

I've thought about this question. An available solution: modify the cluster discovery logic of syncer.

  1. maintain a cluster group spec(like labels to distinguish diff group)
  2. each standalone syncer should acquire the global lock of the cluster group and the labels for super cluster informer to start
  3. standalone syncer only serve the clusters from the cluster group

This solution reduce the watch pressure of many standalone syncer by labeled watch and distributed lock

There is a problem for the per tenant syncer design given the existing syncer implementation. The stand-alone syncer may have to watch the entire super cluster resources since it can create arbitrary number of namespaces in the super cluster. This can be a problem when the super cluster restarts and there are a lot of standalone syncers need to reload their informer caches. The standalone syncer seems to be more suitable for solutions like vCluster where each tenant only creates one namespace in the super cluster.

zhuangqh avatar Jun 20 '22 09:06 zhuangqh

Thanks a lot for your advice, @zhuangqh and @Fei-Guo

In fact ,as describes above, maybe the VC-syncer can work like Nginx master-worker mode, a leading master can assign differnent job to workers, should we move the discussion to
"https://github.com/kubernetes-sigs/cluster-api-provider-nested/"? I can setup a new issue there.

ThunderYe avatar Jun 20 '22 10:06 ThunderYe

@ThunderYe This idea is general to solve a scaling question. However, k8s operator usually work in a 'pull mode', not 'push mode', it automatically discover the job from k8s api. Labeling the object to alter the syncer watched item is what you said master-work pattern.

zhuangqh avatar Jun 20 '22 10:06 zhuangqh

Will labeling help reducing the pressure of apiserver? I am assuming you need to give every synced object a label for a standalone syncer to pick. But the list by label operations still need to read all from etcd and then filtering by label, so the apiserver's cost of handling a list call may not be reduced much by labeling (it does reduce the informer cache size for the syncer).

Fei-Guo avatar Jun 20 '22 17:06 Fei-Guo

@Fei-Guo emmm, i am not so familiar with how apiserver implemented. If it does like what you think, there not much help to reduce the apiserver pressure...

zhuangqh avatar Jun 21 '22 06:06 zhuangqh

Still a live thread ,Ha! In fact ,we want let some VIP tenant master holds a unique Label, that means it will not be watched by the "Default" Syncer , a Syncer with the identical Label will process the VIP tenant master. How do you think abhout it ? @Fei-Guo @zhuangqh

ThunderYe avatar Aug 15 '22 12:08 ThunderYe

@ThunderYe You can certainly do that. I would imagine this is something like SchedulerName field in PodSpec, one can add a field in VC crd called "SyncerName", so a particular VC can be handled by a special syncer exclusively. This is better that the label solution.

Fei-Guo avatar Aug 15 '22 18:08 Fei-Guo

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Nov 13 '22 18:11 k8s-triage-robot

We have implemented a proto version of the labeling idea, some syncers (start with a special argument as label selector) will only process the tenant master (with the idencical label in vc cr) job ,it works well.

Maybe we can contribute the feature to VC community some time later.

ThunderYe avatar Dec 13 '22 10:12 ThunderYe

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jan 12 '23 11:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Feb 11 '23 12:02 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Feb 11 '23 12:02 k8s-ci-robot