cluster-api-provider-vsphere
cluster-api-provider-vsphere copied to clipboard
Support for NTP Pools in vSphereMachine(Templates)
/kind feature
Describe the solution you'd like Currently we can specify a DNS pool under spec.network.devices[].nameservers We cant specify a ppol of NTP server tho, although NTP is also needed for a smooth Netwokr operation. This becomes a problem if you use static IPs / an IPAM controller and not DHCP (which would provide the corrent NTP servers to use) Cloud-init already has support for setting NTP Pools: ( https://cloudinit.readthedocs.io/en/latest/topics/modules.html#ntp )
#cloud-config
ntp:
pools: ['0.company.pool.ntp.org', '1.company.pool.ntp.org', 'ntp.myorg.org']
servers: ['my.ntp.server.local', 'ntp.ubuntu.com', '192.168.23.2']
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue or PR with
/reopen - Mark this issue or PR as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
@k8s-triage-robot: Closing this issue.
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue or PR with
/reopen- Mark this issue or PR as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
The default of having the k8s-ci-robot close issues after X days killed many useful improvement suggestions it seems. I doubt this is helpful, you're losing alot of good user feedback.
NTP should be considered just as a very fundamental service as DNS. Currently it seems the NTP service relys on correct NTP data from the ESXi hosts (using the OpenVM-Tools) but this is not documented here anywhere (I found hints in the Tanzu docs) and there's no obvious way to configure NTP when you review only what CAPV provides.
A less obvious way would be to use Kubeadm for it. Since Kubeadm only reflects a subset of the things Cloud-Init can do it's not perfect - I'd rather have a way to patch Cloud-Init or provide it instead having CAPV generate it - but would at least be good enough to configure NTP.
E.g.: specify NTP in kubeadm:
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
metadata:
name: myfancykubeadmconf
namespace: default
spec:
template:
spec:
files:
- contentFrom:
secret:
key: data
name: containerd-registry-config
owner: root:root
path: /etc/containerd/config.toml
joinConfiguration:
nodeRegistration:
criSocket: /var/run/containerd/containerd.sock
kubeletExtraArgs:
cloud-provider: external
name: '{{ ds.meta_data.hostname }}'
ntp:
enabled: true
servers:
- timeserver1
- timeserver2
preKubeadmCommands:
- hostname "{{ ds.meta_data.hostname }}"
- echo "::1 ipv6-localhost ipv6-loopback" >/etc/hosts
- echo "127.0.0.1 localhost" >>/etc/hosts
- echo "127.0.0.1 {{ ds.meta_data.hostname }}" >>/etc/hosts
- echo "{{ ds.meta_data.hostname }}" >/etc/hostname
users:
- name: capv
sshAuthorizedKeys:
- ssh-rsa XXX
sudo: ALL=(ALL) NOPASSWD:ALL
Which should result in a NTP setting generated for the Cloud-Init scripts, as documented here
But altought the Could-Init data is generated (can be seen in the Secret resource generated and provided via VApp configs) including the NTP data, like this:
ntp:
enabled: true
servers:
- timeserver1
- timeserver2
NTP seems not to be configured. I tested this on Photon OS 3. According KB 76088 it should result in /etc/systemd/timesyncd.conf being set. Also I noticed that the pre-build image comes with chrony so I also checked the config there and: nothing.
Currently the Cloud-Init NTP directive seems to be completely ignored.
The default of having the k8s-ci-robot close issues after X days killed many useful improvement suggestions it seems. I doubt this is helpful, you're losing alot of good user feedback.
Important tickets should be marked with '/lifecycle frozen'
/lifecycle frozen
/reopen
@omniproc: You can't reopen an issue/PR unless you authored it or you are a collaborator.
In response to this:
/reopen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
@MaxRink except you fixed the issue could you please /reopen? I'm not allowed to do so since I'm not the original author.
@neolit123
since @MaxRink isn't responding do you mind re-opening? I don't think this issue is solved yet.
/reopen
I do not see maintainer comments here, so not sure if this is wanted or not by them.
@neolit123: Reopened this issue.
In response to this:
/reopen
I do not see maintainer comments here, so not sure if this is wanted or not by them.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
I think I found the smoking gun why NTP is not applied. It seems like it's a issue with the pre-build Photon image:
root [ /var/lib/cloud/data ]# cat result.json
{
"v1": {
"datasource": "DataSourceVMwareGuestInfo",
"errors": [
"('ntp', RuntimeError('No template found, not rendering chrony.conf.{distro}'))",
"('scripts-user', RuntimeError('Runparts: 1 failures in 1 attempted commands'))"
]
}
}
systemctl status cloud-config.service
● cloud-config.service - Apply the settings specified in cloud-config
Loaded: loaded (/lib/systemd/system/cloud-config.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/cloud-config.service.d
└─boot-order.conf
Active: failed (Result: exit-code) since Thu 2021-12-30 09:58:22 UTC; 25min ago
Main PID: 743 (code=exited, status=1/FAILURE)
Dec 30 09:58:22 se1-acme-p100-x-v7zsf cloud-init[743]: [CLOUDINIT]2021-12-30 09:58:22,457 - util.py[DEBUG]: Reading from /proc/uptime (quiet=False)
Dec 30 09:58:22 se1-acme-p100-x-v7zsf cloud-init[743]: [CLOUDINIT]2021-12-30 09:58:22,457 - util.py[DEBUG]: Read 12 bytes from /proc/uptime
Dec 30 09:58:22 se1-acme-p100-x-v7zsf cloud-init[743]: [CLOUDINIT]2021-12-30 09:58:22,457 - util.py[DEBUG]: cloud-init mode 'modules' took 0.216 seconds (0.21)
Dec 30 09:58:22 se1-acme-p100-x-v7zsf cloud-init[743]: [CLOUDINIT]2021-12-30 09:58:22,457 - handlers.py[DEBUG]: finish: modules-config: FAIL: running modules for config
Dec 30 09:58:22 se1-acme-p100-x-v7zsf cloud-init[743]: [2021-12-30 09:58:22] Cloud-init v. 19.4 running 'modules:config' at Thu, 30 Dec 2021 09:58:22 +0000. Up 20.31 seconds.
Dec 30 09:58:22 se1-acme-p100-x-v7zsf cloud-init[743]: [2021-12-30 09:58:22] 2021-12-30 09:58:22,446 - cloud.py[WARNING]: No template found in /etc/cloud/templates for template named chrony.conf.photon
Dec 30 09:58:22 se1-acme-p100-x-v7zsf cloud-init[743]: [2021-12-30 09:58:22] 2021-12-30 09:58:22,446 - util.py[WARNING]: Running module ntp (<module 'cloudinit.config.cc_ntp' from '/usr/lib/python3.7/site-packages/cloudinit/config/cc_ntp.py'>
) failed
Dec 30 09:58:22 se1-acme-p100-x-v7zsf systemd[1]: cloud-config.service: Main process exited, code=exited, status=1/FAILURE
Dec 30 09:58:22 se1-acme-p100-x-v7zsf systemd[1]: cloud-config.service: Failed with result 'exit-code'.
Dec 30 09:58:22 se1-acme-p100-x-v7zsf systemd[1]: Failed to start Apply the settings specified in cloud-config.
Is the NTP server not getting propagated due to a change in the cloud-init version? Can you specify what cloud-init version is being used?
We're currently using the latest pre-build Photon 3 with Kubernetes 1.20.1. I'll check what cloud-init version it does use and let you know.
P.S.: I also noticed this strange behaviour, that it reports NTP not being started at boot time, in other Photon OS 3.x versions not related to CAPV. So maybe this is a PhotonOS issue.
Maybe transfer to CAPI?
This should be an image-builder topic.
Just tested with the latest OVA for ubuntu (v1.27.3) and it works there. Same for the the photon-3 image, this also works. So I guess this was fixed in image-builder.
What I did use:
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
metadata:
name: capi-test-md-0
namespace: default
spec:
template:
spec:
ntp:
enabled: true
servers:
- time-a-g.nist.gov
joinConfiguration:
nodeRegistration:
criSocket: /var/run/containerd/containerd.sock
kubeletExtraArgs:
cloud-provider: external
name: '{{ local_hostname }}'
preKubeadmCommands:
- hostnamectl set-hostname "{{ ds.meta_data.hostname }}"
- echo "::1 ipv6-localhost ipv6-loopback localhost6 localhost6.localdomain6"
>/etc/hosts
- echo "127.0.0.1 {{ ds.meta_data.hostname }} {{ local_hostname }} localhost
localhost.localdomain localhost4 localhost4.localdomain4" >>/etc/hosts
users:
- name: capv
sshAuthorizedKeys:
- ssh-rsa CENSORED
CENSORED
sudo: ALL=(ALL) NOPASSWD:ALL
And that resulted in:
$ head -n 5 /etc/chrony.conf
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
# servers
server time-a-g.nist.gov iburst
So I guess we could close this?
@MaxRink ^^
Yeah, it got fixed a while back, totally forgot about this issue 😅