azuredisk-csi-driver icon indicating copy to clipboard operation
azuredisk-csi-driver copied to clipboard

[V2] shared disk conflict with NodeAffinity and PodAffinity

Open andyzhangx opened this issue 2 years ago • 7 comments

What happened:

  1. V2 driver scheduler extender would set different score on node if shared disk has already been attached to the node, thus it would has conflict with preferred node affinity, preferred pod affinity at least: https://kubernetes.io/docs/tasks/configure-pod-container/assign-pods-nodes-using-node-affinity/#schedule-a-pod-using-preferred-node-affinity

e.g. if shared disk is already attached to one node, and that node is not preferred node affinity, will it scheduled or not scheduled on to that node? what's the node score in such case?

  1. what about the required node affinity case? e.g. e.g. if shared disk is already attached to one node, and that node is not in the node affinity, will it scheduled or not scheduled on to that node? what's the node score in such case? I suppose in that case, pod should NOT be scheduled to that node

What you expected to happen:

How to reproduce it:

Anything else we need to know?:

Environment:

  • CSI Driver version:
  • Kubernetes version (use kubectl version):
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

andyzhangx avatar Jul 18 '22 08:07 andyzhangx

@landreasyan , @sunpa93 Can one of you look into this?

edreed avatar Jul 25 '22 16:07 edreed

Will do :)

sunpa93 avatar Jul 25 '22 17:07 sunpa93

So, this portion actually gets handled when the controller creates replica attachment and scheduler extender actually does not process affinities separately I believe.

When the controller creates replica AzVolumeAttachment, it filters and scores nodes based on filter / score plugins.

Pod affinity, pod anti affinity, and node selectors are part of node filters. So, nodes that don't satisfy these affinity rules and selector rules will be filtered out of the replica node candidates.

https://github.com/kubernetes-sigs/azuredisk-csi-driver/blob/0226b3d8ddd3347975a1b4eb32cc20218f459eb3/pkg/controller/common.go#L693 https://github.com/kubernetes-sigs/azuredisk-csi-driver/blob/0226b3d8ddd3347975a1b4eb32cc20218f459eb3/pkg/controller/common.go#L846 https://github.com/kubernetes-sigs/azuredisk-csi-driver/blob/0226b3d8ddd3347975a1b4eb32cc20218f459eb3/pkg/controller/common.go#L1125

sunpa93 avatar Jul 25 '22 18:07 sunpa93

I think required node affinity should work, what about preferred node affinity? @sunpa93 e.g. if shared disk is already attached to one node, and that node is not preferred node affinity, will it scheduled or not scheduled on to that node? what's the node score in such case?

andyzhangx avatar Aug 04 '22 16:08 andyzhangx

As of now, the controller only holds filter plugins for required affinities but no score plugins for preferred affinities yet in its replica attachment placement logic. We will be adding more score plugins for the preferred node affinity in future release. https://github.com/kubernetes-sigs/azuredisk-csi-driver/blob/0226b3d8ddd3347975a1b4eb32cc20218f459eb3/pkg/controller/common.go#L1178-L1182

AzDiskSchedulerExtender decisions are dependent on which node the requested replica attachments are attached to, so in your scenario, it will be scheduled to that node. That said, node score from the scheduler extender does not factor in any affinity rules as it fully relies on existing AzVolumeAttachment information in cluster.

sunpa93 avatar Aug 08 '22 16:08 sunpa93

@sunpa93 thanks for the explanation, I think we should add notes somewhere that preferred node affinity is not applied in v2 driver for shared disk mode since v2 driver changed the node affinity scheduling.

andyzhangx avatar Aug 10 '22 09:08 andyzhangx

@andyzhangx Sounds good 👍

sunpa93 avatar Aug 11 '22 02:08 sunpa93

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Nov 29 '22 02:11 k8s-triage-robot