cluster-api-provider-vsphere icon indicating copy to clipboard operation
cluster-api-provider-vsphere copied to clipboard

Support for NTP Pools in vSphereMachine(Templates)

Open MaxRink opened this issue 4 years ago • 19 comments
trafficstars

/kind feature

Describe the solution you'd like Currently we can specify a DNS pool under spec.network.devices[].nameservers We cant specify a ppol of NTP server tho, although NTP is also needed for a smooth Netwokr operation. This becomes a problem if you use static IPs / an IPAM controller and not DHCP (which would provide the corrent NTP servers to use) Cloud-init already has support for setting NTP Pools: ( https://cloudinit.readthedocs.io/en/latest/topics/modules.html#ntp )

#cloud-config
ntp:
  pools: ['0.company.pool.ntp.org', '1.company.pool.ntp.org', 'ntp.myorg.org']
  servers: ['my.ntp.server.local', 'ntp.ubuntu.com', '192.168.23.2']

MaxRink avatar Feb 01 '21 13:02 MaxRink

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

fejta-bot avatar May 02 '21 15:05 fejta-bot

/remove-lifecycle stale

MaxRink avatar May 06 '21 21:05 MaxRink

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

k8s-triage-robot avatar Aug 04 '21 21:08 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Sep 03 '21 22:09 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-triage-robot avatar Oct 03 '21 22:10 k8s-triage-robot

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Oct 03 '21 22:10 k8s-ci-robot

The default of having the k8s-ci-robot close issues after X days killed many useful improvement suggestions it seems. I doubt this is helpful, you're losing alot of good user feedback.

NTP should be considered just as a very fundamental service as DNS. Currently it seems the NTP service relys on correct NTP data from the ESXi hosts (using the OpenVM-Tools) but this is not documented here anywhere (I found hints in the Tanzu docs) and there's no obvious way to configure NTP when you review only what CAPV provides.

A less obvious way would be to use Kubeadm for it. Since Kubeadm only reflects a subset of the things Cloud-Init can do it's not perfect - I'd rather have a way to patch Cloud-Init or provide it instead having CAPV generate it - but would at least be good enough to configure NTP.

E.g.: specify NTP in kubeadm:

apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
metadata:
  name: myfancykubeadmconf
  namespace: default
spec:
  template:
    spec:
      files:
      - contentFrom:
          secret:
            key: data
            name: containerd-registry-config
        owner: root:root
        path: /etc/containerd/config.toml
      joinConfiguration:
        nodeRegistration:
          criSocket: /var/run/containerd/containerd.sock
          kubeletExtraArgs:
            cloud-provider: external
          name: '{{ ds.meta_data.hostname }}'
      ntp:
        enabled: true
        servers:
        - timeserver1
        - timeserver2
      preKubeadmCommands:
      - hostname "{{ ds.meta_data.hostname }}"
      - echo "::1         ipv6-localhost ipv6-loopback" >/etc/hosts
      - echo "127.0.0.1   localhost" >>/etc/hosts
      - echo "127.0.0.1   {{ ds.meta_data.hostname }}" >>/etc/hosts
      - echo "{{ ds.meta_data.hostname }}" >/etc/hostname
      users:
      - name: capv
        sshAuthorizedKeys:
        - ssh-rsa XXX
        sudo: ALL=(ALL) NOPASSWD:ALL

Which should result in a NTP setting generated for the Cloud-Init scripts, as documented here

But altought the Could-Init data is generated (can be seen in the Secret resource generated and provided via VApp configs) including the NTP data, like this:

ntp:
  enabled: true
  servers:
    - timeserver1
    - timeserver2

NTP seems not to be configured. I tested this on Photon OS 3. According KB 76088 it should result in /etc/systemd/timesyncd.conf being set. Also I noticed that the pre-build image comes with chrony so I also checked the config there and: nothing. Currently the Cloud-Init NTP directive seems to be completely ignored.

omniproc avatar Oct 28 '21 15:10 omniproc

The default of having the k8s-ci-robot close issues after X days killed many useful improvement suggestions it seems. I doubt this is helpful, you're losing alot of good user feedback.

Important tickets should be marked with '/lifecycle frozen'

neolit123 avatar Oct 28 '21 15:10 neolit123

/lifecycle frozen

omniproc avatar Oct 28 '21 15:10 omniproc

/reopen

omniproc avatar Oct 28 '21 15:10 omniproc

@omniproc: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Oct 28 '21 15:10 k8s-ci-robot

@MaxRink except you fixed the issue could you please /reopen? I'm not allowed to do so since I'm not the original author.

omniproc avatar Oct 28 '21 16:10 omniproc

@neolit123

since @MaxRink isn't responding do you mind re-opening? I don't think this issue is solved yet.

omniproc avatar Oct 29 '21 13:10 omniproc

/reopen

I do not see maintainer comments here, so not sure if this is wanted or not by them.

neolit123 avatar Oct 29 '21 13:10 neolit123

@neolit123: Reopened this issue.

In response to this:

/reopen

I do not see maintainer comments here, so not sure if this is wanted or not by them.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Oct 29 '21 13:10 k8s-ci-robot

I think I found the smoking gun why NTP is not applied. It seems like it's a issue with the pre-build Photon image:

root [ /var/lib/cloud/data ]# cat result.json
{
 "v1": {
  "datasource": "DataSourceVMwareGuestInfo",
  "errors": [
   "('ntp', RuntimeError('No template found, not rendering chrony.conf.{distro}'))",
   "('scripts-user', RuntimeError('Runparts: 1 failures in 1 attempted commands'))"
  ]
 }
}

omniproc avatar Oct 29 '21 16:10 omniproc

systemctl status cloud-config.service
● cloud-config.service - Apply the settings specified in cloud-config
   Loaded: loaded (/lib/systemd/system/cloud-config.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/cloud-config.service.d
           └─boot-order.conf
   Active: failed (Result: exit-code) since Thu 2021-12-30 09:58:22 UTC; 25min ago
 Main PID: 743 (code=exited, status=1/FAILURE)

Dec 30 09:58:22 se1-acme-p100-x-v7zsf cloud-init[743]: [CLOUDINIT]2021-12-30 09:58:22,457 - util.py[DEBUG]: Reading from /proc/uptime (quiet=False)
Dec 30 09:58:22 se1-acme-p100-x-v7zsf cloud-init[743]: [CLOUDINIT]2021-12-30 09:58:22,457 - util.py[DEBUG]: Read 12 bytes from /proc/uptime
Dec 30 09:58:22 se1-acme-p100-x-v7zsf cloud-init[743]: [CLOUDINIT]2021-12-30 09:58:22,457 - util.py[DEBUG]: cloud-init mode 'modules' took 0.216 seconds (0.21)
Dec 30 09:58:22 se1-acme-p100-x-v7zsf cloud-init[743]: [CLOUDINIT]2021-12-30 09:58:22,457 - handlers.py[DEBUG]: finish: modules-config: FAIL: running modules for config
Dec 30 09:58:22 se1-acme-p100-x-v7zsf cloud-init[743]: [2021-12-30 09:58:22] Cloud-init v. 19.4 running 'modules:config' at Thu, 30 Dec 2021 09:58:22 +0000. Up 20.31 seconds.
Dec 30 09:58:22 se1-acme-p100-x-v7zsf cloud-init[743]: [2021-12-30 09:58:22] 2021-12-30 09:58:22,446 - cloud.py[WARNING]: No template found in /etc/cloud/templates for template named chrony.conf.photon
Dec 30 09:58:22 se1-acme-p100-x-v7zsf cloud-init[743]: [2021-12-30 09:58:22] 2021-12-30 09:58:22,446 - util.py[WARNING]: Running module ntp (<module 'cloudinit.config.cc_ntp' from '/usr/lib/python3.7/site-packages/cloudinit/config/cc_ntp.py'>
) failed
Dec 30 09:58:22 se1-acme-p100-x-v7zsf systemd[1]: cloud-config.service: Main process exited, code=exited, status=1/FAILURE
Dec 30 09:58:22 se1-acme-p100-x-v7zsf systemd[1]: cloud-config.service: Failed with result 'exit-code'.
Dec 30 09:58:22 se1-acme-p100-x-v7zsf systemd[1]: Failed to start Apply the settings specified in cloud-config.

omniproc avatar Dec 30 '21 10:12 omniproc

Is the NTP server not getting propagated due to a change in the cloud-init version? Can you specify what cloud-init version is being used?

srm09 avatar Jan 28 '22 22:01 srm09

We're currently using the latest pre-build Photon 3 with Kubernetes 1.20.1. I'll check what cloud-init version it does use and let you know.

P.S.: I also noticed this strange behaviour, that it reports NTP not being started at boot time, in other Photon OS 3.x versions not related to CAPV. So maybe this is a PhotonOS issue.

omniproc avatar Jan 29 '22 11:01 omniproc

Maybe transfer to CAPI?

randomvariable avatar Aug 03 '23 17:08 randomvariable

This should be an image-builder topic.

Just tested with the latest OVA for ubuntu (v1.27.3) and it works there. Same for the the photon-3 image, this also works. So I guess this was fixed in image-builder.

What I did use:

apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
metadata:
  name: capi-test-md-0
  namespace: default
spec:
  template:
    spec:
      ntp:
        enabled: true
        servers:
        - time-a-g.nist.gov
      joinConfiguration:
        nodeRegistration:
          criSocket: /var/run/containerd/containerd.sock
          kubeletExtraArgs:
            cloud-provider: external
          name: '{{ local_hostname }}'
      preKubeadmCommands:
      - hostnamectl set-hostname "{{ ds.meta_data.hostname }}"
      - echo "::1         ipv6-localhost ipv6-loopback localhost6 localhost6.localdomain6"
        >/etc/hosts
      - echo "127.0.0.1   {{ ds.meta_data.hostname }} {{ local_hostname }} localhost
        localhost.localdomain localhost4 localhost4.localdomain4" >>/etc/hosts
      users:
      - name: capv
        sshAuthorizedKeys:
        - ssh-rsa CENSORED
          CENSORED
        sudo: ALL=(ALL) NOPASSWD:ALL

And that resulted in:

$ head -n 5 /etc/chrony.conf
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
# servers
server time-a-g.nist.gov iburst

So I guess we could close this?

chrischdi avatar Aug 04 '23 08:08 chrischdi

@MaxRink ^^

sbueringer avatar Aug 21 '23 11:08 sbueringer

Yeah, it got fixed a while back, totally forgot about this issue 😅

MaxRink avatar Aug 21 '23 20:08 MaxRink