cluster-api
cluster-api copied to clipboard
Windows Support: NetBIOS and Active Directory LDAP SAMAccountName restrictions on Hostname
User Story
As an operator, I would like to manage windows server worker nodes with the cluster api. Hostnames on windows are limited to 15 characters, and the hostnames that are set by the cluster api (by default in cloud-init metadata) exceed this limit. The cluster api should support a more flexible mechanism of setting hostnames so that shorter hostnames can be set for VMs.
Detailed Description
Netbios requires windows computer names to be 15 characters or fewer (https://support.microsoft.com/en-us/help/909264/naming-conventions-in-active-directory-for-computers-domains-sites-and). Attempting to set hostname with more than 15 characters on a windows machine will result in only the first 15 being used.
When using the machine deployment api object, the machine api object names are derived from the machineset controller (https://github.com/kubernetes-sigs/cluster-api/blob/7884484b621f13f604e74f60053f4214a2f19702/controllers/machineset_controller.go#L434). This name is later used to set the vm name (for example in CAPV - https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/blob/895539d004ea33299435a2c739791e9800d0c2ae/controllers/vspheremachine_controller.go#L320), and then also as the local-hostname in the cloud-init metadata (https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/blob/390c49a23e2b535a27b330e4983c59eb0b42f476/pkg/services/govmomi/service.go#L203).
The machine api object names are prefixed by the name of the machine deployment api object. These names, for example, will be in the form:
workload-cluster-2-md-0-5f77f47487-2c4sq
workload-cluster-2-md-0-5f77f47487-25xhg
where workload-cluster-2-md-0 is the name of the machine deployment api object. The prefix is appended with 17 extra characters (-5f77f47487-2c4sq, -5f77f47487-25xhg), which will bring the total character count above 15. Notice that setting the deployment api object name to 3 or more characters will guarantee the same first 15 characters, and thus hostname collisions for the nodes. Being able to set the deployment api object name to something more meaningful than what could be expressed in 3 characters would be useful.
My current workaround is to have cloudbase-init invoke an additional script before the join command that reforms the host name and sets it for the vm. This is somewhat undesirable as now the hostname and node api object name are not the same as the vm name. For consistency, it's desired (but not required) that the the vm name (as shown by the cloud provider), the machine api object name, and the node api object name are the same as the hostname of the vm.
Anything else you would like to add:
I realize that windows worker nodes are not officially supported by the cluster api, but I'm mentioning it since it's something that's up for discussion for the cluster-api roadmap (https://github.com/kubernetes-sigs/cluster-api/pull/2148/files#diff-767f66541aad47089dd5ded720dede6bR32).
Another workaround could be use to use the machine api object directly instead of the machine deployment api object, which would directly set the vm name based on the name of the machine api object. However, the benefits of using the machine deployment are lost.
/kind feature
The main reason for the hostname matching the Machine name is currently due to the initial implementation details of vSphere infrastructure provider. In the case of AWS and Linux hosts, there is a requirement when using the AWS cloud provider integration that the hostname must match the internal dns name of the host and we override the hostname setting via cloud-init config for each Machine we provision.
Outside of limitations mentioned above, there should be no requirements that the hostname of an individual instance match the Machine name in any way.
Agreed - that's certainly not a requirement.
The cloud-init metadata local-hostname is set to the Machine name (at least on CAPV) - what I would propose is flexibility with how local-hostname metadata gets set, so that it's not necessarily set by default to the Machine name.
I don't think this is a CAPI issue, I think this is just with CAPV. On AWS the hostnames are not specified in the cloud-init metadata
@akutz @yastij Would you mind taking a look at this?
Is this definitely an issue in a Kubernetes context? The linked page looks like it was written for Windows XP and 2003 when NetBIOS was still a thing. AD DNS names shouldn't be restricted in the same way, and they do say for FQDNs, it's 63 chars per component, 255 total.
Is the issue is that a machine configured with NetBIOS will register a Kerberos principal with the truncated name? If so, is there a case to be made that NetBIOS should be disabled in Windows images?
AFAIK, NetBios is still required to domain join a windows machine. Looping in @ksubrmnn and @JocelynBerrendonner.
It might depend on how credentials are provided and how the domain is specified. If the FQDN is used and credentials are provided as [email protected], it should default to the DNS SRV records? I admit it's been a decade since I touched Windows, but my memory was that this was possible in at least Win2K8/Vista.
AFAIK, NetBios is still required to domain join a windows machine. Looping in @ksubrmnn and @JocelynBerrendonner.
Thanks for reaching out! I don't know the answer to the Netbios/domain join question off the top of my head, but I'll find the experts and pull them in shortly.
@rhockenbury : As per my investigation, netbios is not required to join a domain on Windows machine (that's been the case since around Windows 2000). The page you mentioned only provide naming conventions when Netbios is actually used. Also, as other folks mentioned, the machine name is only truncated in Netbios. When setting a long host name (let's say "MyComputerWithALongName") in a domain (let's say "contoso.com"), the machine is still reachable through its FQDN "MyComputerWithALongName.contoso.com". However, through Netbios, it will indeed only be reachable through the truncated Nebios name "MyComputerWithA".
Is using FQDN an option here?
Thanks for the additional insight. It feels that it would be best to disable NetBios seeing how with using the machine api object name as the hostname would result in NetBios name collisions. I'll need to follow-up internally to see if we could do this.
@rhockenbury : after further discussions with the experts, NETBIOS name resolution is mostly unused today. Though the first step in name resolution is usually going through NETBIOS, if the NETBIOS name is not found, Windows will fallback to resolving the machine name using DNS. For example, if you try to reach a machine through "MyComputerWithALongName", Windows will be able to find that name in DNS provided that the DNS Suffix search order is properly populated in the network interface TCP/IP settings (this last point is important). If you try to ping "MyComputerWithALongName" and if the Suffix is properly populated (to, let's say contoso.com), then Windows will behave similarly to Linux and try "MyComputerWithALongName.contoso.com".
The bottom line is, I previously suggested using the FQDN, but as per my discussion with the expert, there is actually no need for it. If the DNS suffix search order is properly populated in Windows nodes, the long host names Cluster-API generates should directly be usable. And whether NETBIOS is enable or not shouldn't matter. If a long name doesn't work with NETBIOS enabled, it will likely not work with NETBIOS disabled either.
FWIW, you can check the DNS suffix list using the Get-DnsClientGlobalSettings in powershell:
_PS C:\hns> Get-DnsClientGlobalSetting
UseSuffixSearchList : True SuffixSearchList : {contoso.com} UseDevolution : True DevolutionLevel : 0_
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
I think we concluded that this isn't an issue? @jsturtevant has also stated as such in the Windows proposal.
/close for now, and we can revisit if it turns out to be a problem?
@randomvariable: Closing this issue.
In response to this:
I think we concluded that this isn't an issue? @jsturtevant has also stated as such in the Windows proposal.
/close for now, and we can revisit if it turns out to be a problem?
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/pull/1052
/reopen
This question was re-raised in SIG Windows around app support, though we were wondering that since pod names and DNS names synonymous, then pod names longer than the NETBIOS limit should also break applications that don't support longer names. If that's the case, it still doesn't make sense to make this a cluster api concern.
I think @JocelynBerrendonner was going to get a definitive answer.
@randomvariable: Reopened this issue.
In response to this:
/reopen
This question was re-raised in SIG Windows around app support, though we were wondering that since pod names and DNS names synonymous, then pod names longer than the NETBIOS limit should also break applications that don't support longer names. If that's the case, it still doesn't make sense to make this a cluster api concern.
I think @JocelynBerrendonner was going to get a definitive answer.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/lifecycle frozen
Hi everyone,
There has been additional discussions about this, and additional learnings for me since my last message.
In a nutshell:
- The Windows TCP/IP stack doesn't require names to be 15 characters long. NETBIOS requirements have been removed in Windows 2000. NETBIOS is mostly deprecated in the latest versions of Windows.
- Many apps and components in Windows support long name, and it is possible to make a Windows host fully work with long names. And typically, Unix apps that run on Windows don't care about the 15 characters names.
- That said, many Win32 applications were not updated to support host names longer than 15 characters (even when using DNS). Such applications may use the GetComputerName Win32 function that bounds the name length to 15 characters (this is the MAX_COMPUTERNAME_LENGTH, which also applies when using pure DNS names and no NETBIOS names).
- There is a strong desire to remove the 15 character limitation, but unfortunately there are scenarios that still don't have a great solution (mostly because of backwards compatibility). And, unfortunately, the 15 characters names limitation will outlive NETBIOS.
- When Windows is configured with a long name (i.e. > 15 characters), applications that call into the GetComputerName functions (as opposed to calling into GetComputerNameEx functions) will receive a 15 characters truncated name. The truncation will use the first 15 characters of the long name. So, the first 15 characters of the name need to be unique so that the truncated name is unique across the network.
- This truncation should work in many cases where the app doesn't care about resolving this truncated computer name using DNS, but if any app/process tries to resolve this truncated name using DNS, the resolution will fail unless there is a way to resolve this name (it can be NETBIOS, or it can be another mechanism).
- To workaround this issue without using NETBIOS, people may think adding a CNAME with the truncated name to DNS could solve the problem. Unfortunately, this approach doesn't work properly with reverse DNS lookup (where one IP address matches two names). This typically messes with Kerberos authentication. So, DNS servers don't have a good solution for this problem.
- So, exposing names as 15 characters long is always the safe option, but it is not a strict requirement per se.
A few question remain, though:
- Aren't machine names in Cluster-API generated by the providers? Wouldn't the naming be up to the providers themselves?
- PODs and Deployments YAMLs can specify the "hostname" field to configure what the host name should be (https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-hostname-and-subdomain-fields). Could the same concept help in Cluster-API?
I've also been checking in and found that Active Directory SAMAccountName is restricted to 20 characters. It's not necessarily a blocker since SAMAccountName doesn't need to match the computer name, but it places constraints on uniqueness.
You're right that the hostname is a function of the provider, not CAPI.
Just adding some additional context to this, it seems that if your hostname is over 15 characters the $env:computername variable cuts off at 15 characters, which I guess it because this is related to the GetComputerName API, this may cause problems for those people using Powershell to configure cni or something similar. hostname however still gets the longer hostname.
Also, when using this with CAPV I have noticed that the identifiers at the end of the generated hostname are over 15 characters before you even add user-specified portion so that may need to be considered when running windows machine deployments.
Noted. thanks.
/area node-agent
@randomvariable: The label(s) area/node-agent cannot be applied, because the repository doesn't have them
In response to this:
Noted. thanks.
/area node-agent
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
@perithompson : Thanks for mentioning this! IIRC, using [System.Net.Dns]::GetHostName() in powershell also returns the full name.
/retitle Windows Support: NetBIOS and Active Directory LDAP SAMAccountName restrictions on Hostname
Update on this: Regardless of NETBIOS, we will need hostname restricted because of the SAMAccountName, so have retitled the issue appropriately.
In terms of next steps:
Whether or not the machine, and concretely, cloud-init, ignition or whatever takes the hostname from the VM name is up to the cloud provider. It is the case for vSphere, Azure (maybe?), but not for AWS. AWS only uses the instance ID.
For AWS, this means if the machine name is shortened, this has no impact on the hostname unless the hostname is explicitly set in the userdata via cloud-init. However, we also would not want to default this because the Kubernetes AWS Cloud Provider (CPI not CAPA) requires the node name to match the host name which in turn MUST match the instance ID.
Next steps are to document:
- For each IaaS, how does the hostname get populated?
- For each Kubernetes Cloud Provider, what restraints are there on the node name?
@randomvariable, it may be worth noting that SAMAccountName is a name is used to support legacy versions of Windows (Windows NT4, Windows 95, Windows 98, ...: https://docs.microsoft.com/en-us/windows/win32/ad/naming-properties#samaccountname) I believe Windows 2000 and up don't require it.
The docs are referring to how SAMAccountName is consumed, as in it's typically consumed by legacy apps. However, it's still a mandatory field on the Computer LDAP schema, and from which the Computer name is derived - with no indication of being deprecated. SAMAccountName is also used during AD domain join, so it's the strongest of all of these requirements IMO.