aws-ebs-csi-driver Is it possible to use the standard k8s topology labels?

Is it possible to use the standard k8s topology labels?

Open TBBle opened this issue 3 years ago • 35 comments

Is your feature request related to a problem?/Why is this needed

Looking into the interactions of this driver with the Cluster Autoscaler, i.e., https://github.com/kubernetes/autoscaler/issues/3230#issuecomment-774387646, I have been unable to identify a good way to support the scale-from-zero case without needing to either change Cluster Autoscaler to always add this driver's topology label, or have each node group's ASG manually specify the topology label topology.ebs.csi.aws.com/zone.

/feature

Describe the solution you'd like in detail

A parameter to specify the topology label used by the PV dynamic allocation, which in the EKS case, can be set to topology.kubernetes.io/zone (or failure-domain.beta.kubernetes.io/zone on older clusters) and behave correctly.

Describe alternatives you've considered

Hard-coding the topology.ebs.csi.aws.com/zone label into Cluster Autoscaler AWS support the same way the standard zone label is hard-coded.
Manually including the tag k8s.io/cluster-autoscaler/node-template/label/topology.ebs.csi.aws.com/zone on each node group definition in eksctl to which the EBS CSI Daemonset will be deployed, i.e. manually matching the local taint/toleration setup.

See https://github.com/kubernetes/autoscaler/issues/3230 for more details on these alternatives.

Additional context

I'm not sure what the reason is for using a new label originally, which is why I suggest it be an option, rather than changing the default. I'm guessing in non-EKS deployments on EC2, the standard zone label may not be reliably the AWS Zone, if the AWS Cloud Controller is not in-use?

That said, if the reason is that that the label must be owned by the driver for spec or behaviour reasons, then this wouldn't be feasible at all, and we'll probably be stuck manually ensuring that the node groups with the tag correctly match the tolerations of the Daemonset.

Feb 06 '21 03:02 TBBle

I really don't know why we didn't use the well-known label. @wongma7 @leakingtapan maybe?

I agree we should just use them instead. Initially we should support both labels though so we can provide a deprecation warning.

Feb 16 '21 21:02 ayberk

It is because the well-known label wasn't well-known yet by the time this driver's topology key was set. There used to be a separate label failure-domain.beta.kubernetes.io/zone and it got renamed to topology.kubernetes.io/zone as part of the work from sig-cloudprovider. See here. And see here for when this driver's label was set.

Given the fact that the driver is not GA yet, I think it makes sense to fix the label before GA if no other concerns. But also keep in mind to update the CSI translation lib for CSI migration here to avoid breaking CSI migration.

Hope this helps

Feb 16 '21 22:02 leakingtapan

related to: https://github.com/kubernetes-sigs/aws-ebs-csi-driver/issues/675

Feb 16 '21 22:02 leakingtapan

That's really helpful, thanks a lot @leakingtapan!

Since this requires k/k changes, I'll try to come up with a plan.

/assign

Feb 17 '21 18:02 ayberk

We'll need to gracefully retire the current label (topology key), so we need to start with writing both.

Driver starts writing both keys: topology.ebs.csi.aws.com/zone and topology.kubernetes.io/zone. For topology constraints it should use the new one if available or fallback to the old one.
Translation lib similarly has to read both when converting to in-tree. However, we have two options while converting from in-tree to CSI:
1. Only write the new one. This would break the backward-compatibility and older versions of the driver wouldn't be able to read the topology key.
2. Write both keys so we don't break the older versions. Since we're already beta, this is preferred.
At this point all e2e tests should still pass.
Update the e2e tests to use the new key. Unit tests should suffice for supporting the old key.
Remove the old key. Eventually.

Skipping the beta labels should be safe since it's only used by the translation lib.

cc @wongma7 for sanity check

Feb 26 '21 23:02 ayberk

sounds good to me. I understand concretely this to mean

NodeGetInfo must respond such that CSINode.topologyKeys has both oldKey and newKey.

CreateVolume, if provided topology requirement in request, must extract the zone from newKey if exists or oldKey if not

CreateVolume must respond such that PVs have nodeAffininity.nodeSelector oldKey==A || newKey==A (by returning AccessibleTopology array containing both)

Feb 27 '21 00:02 wongma7

If I'm following correctly, landing #773 would be sufficient for Cluster Autoscaler to operate correctly with --balancing-ignore-label=topology.ebs.csi.aws.com/zone, because #773 is ensuring that it has the same value as topology.kubernetes.io/zone. So CA can rely on the well-known label for "will this Pod be schedulable on this node-template", even though csi-translation-lib is hinging off the deprecated-but-identical value when it actually interacts with the scheduler.

It looks like #773 missed the 0.10 release branch, so I guess it'll be part of 0.11?

Mar 28 '21 06:03 TBBle

You're right, #773 is enough. Remaining work is irrelevant.

I can see it in the release PR. We haven't released it but 0.10 will include it. See #811

Mar 29 '21 04:03 ayberk

We are running 0.10 and hit an issue of it complaining about not having a topology key on the node to provision a new volume from a PVC. Does #773 prevent needing the key topology.ebs.csi.aws.com/zone or is it still required for now?

Apr 03 '21 02:04 denniswebb

For now you should still use both. Driver itself priorities topology.kubernetes.io/zone, but it's not present falls back to topology.ebs.csi.aws.com/zone. We haven't updated the csi-translation on the upstream though, so we're still writing and reading both keys.

Do you mind opening an issue ideally with more details?

Apr 06 '21 17:04 ayberk

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

Jul 05 '21 18:07 fejta-bot

/remove-lifecycle stale

label best practices needs discussion with kubernetes upstream . It's not clear at this time whether we should be putting the standard label or if the standard label should be reserved forr use by in-tree drivers (which simplifies migration because migration translates from our custom label to the standard label.)

Jul 06 '21 18:07 wongma7

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Oct 04 '21 18:10 k8s-triage-robot

/remove-lifecycle stale

Oct 05 '21 10:10 k1rk

To cross-link, it seems there's a problem with this approach. I'm not yet clear on the actual blocker there, unless there's a need/use-case for topology zones that do not match the active cloud controller's zones being used to populate the Node's topology.kubernetes.io/zone label.

Anyway, I wanted to make sure that comment was linked from here, as I had not realised from the history of this ticket that the initial change to move towards this state had been reverted.

Nov 19 '21 07:11 TBBle

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Feb 17 '22 08:02 k8s-triage-robot

/remove-lifecycle stale

Mar 02 '22 05:03 k1rk

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

May 31 '22 06:05 k8s-triage-robot

/remove-lifecycle stale

Jun 28 '22 11:06 z0rc

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Sep 26 '22 11:09 k8s-triage-robot

/remove-lifecycle stale

Sep 28 '22 14:09 z0rc

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Dec 27 '22 15:12 k8s-triage-robot

/remove-lifecycle stale

Dec 28 '22 11:12 z0rc

/close

fixed in #1360 - available in the driver since version v1.12.0

Mar 02 '23 19:03 ConnorJC3

@ConnorJC3: Closing this issue.

In response to this:

/close

fixed in #1360 - available in the driver since version v1.12.0

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Mar 02 '23 19:03 k8s-ci-robot

@ConnorJC3 I think this should be re-opened. From what I can see, the fix was reverted because it was causing duplicate topology selector terms in the CSI migration case.

So it's still the case, even with latest driver version, that provisioned PVs select only on the non-standardised topology.ebs.csi.aws.com/zone label, and don't work out of the box with cluster-autoscaler when scaling node groups up from zero. (You have to manually label the ASGs with a node template so that cluster-autoscaler knows that the first node will have the topology.ebs.csi.aws.com/zone label once it starts up.)

One solution to solve this without harming compatibility with existing clusters would be to provide an option, disabled by default, to use the standardised topology.kubernetes.io/zone label. This seems to be what is discussed in https://github.com/kubernetes-sigs/aws-ebs-csi-driver/issues/962, which was closed due to no activity.

Mar 06 '23 06:03 jbg

Ah, that is correct, I will reopen this, sorry. Will look into fixing this in the near-ish future completely.

Mar 06 '23 16:03 ConnorJC3

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jun 04 '23 16:06 k8s-triage-robot

/remove-lifecycle stale

Jun 05 '23 21:06 z0rc

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jan 21 '24 23:01 k8s-triage-robot

aws-ebs-csi-driver aws-ebs-csi-driver copied to clipboard

Is it possible to use the standard k8s topology labels?

aws-ebs-csi-driver
aws-ebs-csi-driver copied to clipboard