kops icon indicating copy to clipboard operation
kops copied to clipboard

Allow using IP-based naming with custom domain and CCM

Open ValeriiVozniuk opened this issue 2 years ago • 21 comments

/kind feature

With switching to CCM in new kops versions the only available option is "resource based naming". And "IP-based naming" is broken in kops 1.23+ if custom domain is used. We've tried "resource based naming" with kops 1.23 on our test cluster, and everyone literally hates new nodes names based on aws instance id. I assume it could be useful in small clusters which are running within single region, as you don't need to do additional queries to find aws instance id, but in our multi-region setup it is really bad. Before you could easily see where are the instances running just by looking IP part in name in monitoring, now you need to somehow find out in what region that instance is running first, and they try to find it there. Minutes of wasted time instead of seconds of thinking. Thus, could you please provide an option for CCM to name nodes based on IP/fix custom domain handling to have it working like it was before kops 1.23? PS: I saw that there are some hostnameOverride options for kubelet/kubeProxy, but I was not able to make it working on cluster level, both with/without CCM. While it is present in spec, on actual node it is always amazon_node_id or dns name

Reference link: https://cloud-provider-aws.sigs.k8s.io/prerequisites/

ValeriiVozniuk avatar Jul 27 '22 13:07 ValeriiVozniuk

IP based naming without using CCM was resolved with https://github.com/kubernetes/kops/pull/14024

However, when you upgrade to use CCM on a supported k8s version, resource based naming will indeed be forced. Your arguments seem based around a cluster running in multiple regions, but I don't see how that is possible. Are you saying you have such a setup?

hostnameOverride won't work. You can't set this to a static value.

Adding an option to use IP-based naming could be a feature worth adding though. Happy to review a PR.

olemarkus avatar Jul 27 '22 15:07 olemarkus

It's also worth mentioning that kubectl plugins or aliases can easily visualize this information that otherwise takes minutes to find.

https://kubernetes.io/docs/reference/kubectl/#custom-columns

rifelpet avatar Jul 27 '22 16:07 rifelpet

Hi @olemarkus,

Thank you, for some reasons I didn't get a notification about #14024, and I see that release 1.23.3 was created, so will test it once binaries will became available :). About your question: sorry, I've described our setup not quite correctly (and incompletely). Of course, we have a cluster-per-region setup, but data from all clusters are gathered to centralized Grafana setup, where we have multiple dashboards/alerts/etc for monitoring our applications deployments per all clusters. While part of that information is shown in per-cluster view, many dashboards/alerts are set across all clusters for dev/ops convenience. And there are also alerts to Slack when something happened. With ip based names, it is easy to immediately see where is something happening without looking for/adding extra things, even in short Slack message. People who are in company for a long time, could easily recognize even instance group out of ip in name :). With resource based naming, all this useful information needs to be exposed in different ways, overcomplicating things for us. As for pull request, unfortunately my Go skills are still not enough to make a patch like this :)

@rifelpet Sorry, I should wrote that this is not about kubectl stuff :)

ValeriiVozniuk avatar Jul 29 '22 13:07 ValeriiVozniuk

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Oct 27 '22 21:10 k8s-triage-robot

/remove-lifecycle stale

ValeriiVozniuk avatar Oct 28 '22 18:10 ValeriiVozniuk

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Feb 08 '23 11:02 k8s-triage-robot

/remove-lifecycle stale

ValeriiVozniuk avatar Feb 08 '23 13:02 ValeriiVozniuk

@olemarkus Per what I see, the new naming scheme with CCM is enforced here https://github.com/kubernetes/kops/blob/master/upup/pkg/fi/cloudup/template_functions.go#L715 Any reasons why it is like that, or just removing that condition would be enough to achieve the goal here?

ValeriiVozniuk avatar May 02 '23 09:05 ValeriiVozniuk

Any update on this? It is still an issue, not sure if precisely the same: https://github.com/kubernetes/kops/issues/15891

The attribute that is used to retrieve the value for node name should ideally be user-configurable rather than calculated dynamically in an obscure and undocumented way.

shapirus avatar Sep 08 '23 11:09 shapirus

@shapirus At the moment there is no work done by the maintainers to support this feature. We are happy to review contributions on this topic. It shouldn't be a big change, but will require some effort.

hakman avatar Sep 08 '23 12:09 hakman

Is the forced instanceID naming required by CCM? What was the rationale behind this change? Asking because there may be technical reasons behind it which I don't know about.

p.s. it's not really a feature. It's (in my particular case described in https://github.com/kubernetes/kops/issues/15891) reverting to original behavior.

shapirus avatar Sep 08 '23 12:09 shapirus

Is the forced instanceID naming required by CCM?

No

What was the rationale behind this change? Asking because there may be technical reasons behind it which I don't know about.

This was required to support IPv6. It also fixed the numerous reports of not working clusters due to misconfigured DHCP options.

p.s. it's not really a feature. It's (in my particular case described in https://github.com/kubernetes/kops/issues/15891) reverting to original behavior.

At the moment, the old behaviour is gone. I understand that some users don't like the new behaviour, but changing it back to the way it was in kOps 1.22 requires an effort.

hakman avatar Sep 10 '23 05:09 hakman

Any update on this?

At the moment, the old behaviour is gone. I understand that some users don't like the new behaviour, but changing it back to the way it was in kOps 1.22 requires an effort.

Is there an ETA to change it back to the way that was in the Kops 1.22? Thank you

xiaoh163 avatar Oct 19 '23 07:10 xiaoh163

Is there an ETA to change it back to the way that was in the Kops 1.22?

@xiaoh163 There is no ETA, as there is no work being done to change it back.

hakman avatar Oct 19 '23 07:10 hakman

got it thank you

xiaoh163 avatar Oct 19 '23 08:10 xiaoh163

Any update on this?

At the moment, the old behaviour is gone. I understand that some users don't like the new behaviour, but changing it back to the way it was in kOps 1.22 requires an effort.

Is there an ETA to change it back to the way that was in the Kops 1.22? Thank you

We are facing the same problem... Reverting to the IP-based node name is MUCH appreciated!

maxwheel avatar Nov 30 '23 02:11 maxwheel

We are facing the same problem... Reverting to the IP-based node name is MUCH appreciated!

We are facing the same problem... Reverting to the IP-based node name is MUCH appreciated!

maxwheel avatar Dec 04 '23 02:12 maxwheel

I suggest adding some stronger reasoning to why this should be reintroduced. I don't consider 'metadata as part of the hostname' as much of a valid use case. As mentioned above, a simple API call either to AWS or to k8s will reveal much richer metadata.

olemarkus avatar Dec 04 '23 09:12 olemarkus

Having them as hostnames at least provides a better user experience (in most cases; sometimes you actually do need the providerID value): easier visual recognition of the host in the ouput of kubectl <top|get> nodes, being able to ssh into the copy-pasted host name taken from that output without extra wrapper scripts, etc.

Besides, it was a change of behavior made for no immediately apparent reason, avoiding which is generally considered a good practice.

shapirus avatar Dec 04 '23 10:12 shapirus

Hi @olemarkus Could I confirm if there is any action or plan to support a switch for UseInstanceIDForNodeName or IP-based name?

Some challenges for me: Actually, in our cluster, we only use IPv4(Single-stack), and no requirement to enable Dual-stack or IPv6. But the enforcement nodeName change breaks our monitor(Pod-Node 1:1 deployment and instance CPU monitoring), and HPA/Prometheus adapters can't work after K8S cluster is upgraded to 1.24.

BTW, we also deploy multi k8s clusters in different AWS regions/data centers. DNS-name style node-name is much more user-friendly for troubleshooting.

Currently, we don't have much resources to do adapting work for AWS instance-ID as nodeName.

Regards.

xqzhang2015 avatar Apr 08 '24 09:04 xqzhang2015

As far as I know, no one is working on this, no.

olemarkus avatar Apr 08 '24 11:04 olemarkus