okd4-single-node-cluster
okd4-single-node-cluster copied to clipboard
installing on ESXi
If I want to install the whole system on ESXi server and start to build an initial snc-host VM on it, what would be the specs of the VM (for example, disk)? If I understood right, an initial machine (snc-host) is deployed first, then okd machines (bootstrap and master/worker) are deployed as VMs on top of it. Am I right?
You will need to enable nested virtualization on your ESXi hypervisor, which I assume can be done, but haven't done it personally.
Give your snc-host 32GB of RAM, 4 or preferably 6 CPUs, and 500GB of disk. You can get away with less disk if needed.
I assigned reachable address to the snc-host and edited the files so both of the bootstrap and master nodes will have IPs from the same range as well. For this reason I thought there shouldn't be necessary to create a network bridge. But DeployOkdSnc.sh script stopped giving me the following error:
+ mkdir -p /VirtualMachines/okd4-snc-master
+ virt-install --name okd4-snc-master --memory 16384 --vcpus 4 --disk size=200,path=/VirtualMachines/okd4-snc-master/rootvol,bus=sata --cdrom /tmp/snc-master.iso --network bridge=br0 --mac=52:54:00:b8:25:07 --graphics none --noautoconsole --os-variant centos7.0
Starting install...
Allocating 'rootvol' | 200 GB 00:00:00
ERROR Cannot get interface MTU on 'br0': No such device
Removing disk 'rootvol' | 0 B 00:00:00
Domain installation does not appear to have been successful.
If it was, you can restart your domain by running:
virsh --connect qemu:///system start okd4-snc-master
otherwise, please restart your installation.
+ rm -rf /root/okd4-snc/fcos-iso
[root@snc-host okd4-snc]#
Should I create a network bridge in this case or edit the script to ignore bridge config (I don't know how, anyway)?
The scripts that I prepared are very opinionated toward a Libvirt install on a bare metal host.
You will either have to create the bridge device on your snc-host, or modify the script to build the VMs natively on ESXi.
Unfortunately, I don't have an ESX setup to test with, just my bare metal NUCs.
Hello; I created bridge and finished the process. I destroyed bootstrap vm through the script and managed to login to the cluster through web console. But there were some issues;
while issuing "oc apply -f", "oc adm policy" and "oc patch" commands, I got this:
error: Missing or incomplete configuration info. Please point to an existing, complete config file:
1. Via the command-line flag --kubeconfig
2. Via the KUBECONFIG environment variable
3. In your home directory as ~/.kube/config
To view or setup config directly use the 'config' command.
The oc get pods command ran successfully:
[root@snc-host bin]# oc get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
openshift-apiserver-operator openshift-apiserver-operator-775b96cbc7-jdzx8 1/1 Running 4 49m
openshift-apiserver apiserver-5f6c8f7b57-wqnn2 2/2 Running 0 39m
openshift-authentication-operator authentication-operator-7db54f67b4-rzzwr 1/1 Running 4 49m
openshift-authentication oauth-openshift-f896f6bd7-4jqhk 1/1 Running 1 19m
openshift-authentication oauth-openshift-f896f6bd7-g6zxp 1/1 Running 1 18m
openshift-cloud-credential-operator cloud-credential-operator-7f849b64-nfbt4 2/2 Running 0 49m
openshift-cluster-machine-approver machine-approver-8dc558468-fjkr4 2/2 Running 0 49m
openshift-cluster-node-tuning-operator cluster-node-tuning-operator-666599dc7d-2c2mz 1/1 Running 0 49m
openshift-cluster-node-tuning-operator tuned-9vx26 1/1 Running 0 44m
openshift-cluster-samples-operator cluster-samples-operator-7897fc6cfc-v5scl 2/2 Running 0 32m
openshift-cluster-storage-operator cluster-storage-operator-7c8b7446db-hmgtd 1/1 Running 3 49m
openshift-cluster-storage-operator csi-snapshot-controller-5467fccd7d-xj2p8 1/1 Running 4 45m
openshift-cluster-storage-operator csi-snapshot-controller-operator-6775454678-p9t54 1/1 Running 3 49m
openshift-cluster-version cluster-version-operator-675d5596b5-ws5mn 1/1 Running 4 49m
openshift-config-operator openshift-config-operator-5898656dd5-m8lqb 0/1 CrashLoopBackOff 10 49m
openshift-console-operator console-operator-55dc54764d-4mn67 1/1 Running 8 32m
openshift-console console-65b8998b59-8lmd8 0/1 CreateContainerError 0 19m
openshift-console console-65b8998b59-lfrlw 1/1 Running 1 19m
openshift-console downloads-854f79cbf8-hkxmj 1/1 Running 0 32m
openshift-console downloads-854f79cbf8-ntg8l 0/1 Completed 1 32m
openshift-controller-manager-operator openshift-controller-manager-operator-69fb47658c-t8fkv 1/1 Running 4 49m
openshift-controller-manager controller-manager-ch885 1/1 Running 4 42m
openshift-dns-operator dns-operator-77bb7f84c-mft2q 2/2 Running 0 49m
openshift-dns dns-default-rkfnl 3/3 Running 0 41m
openshift-etcd-operator etcd-operator-6b7b4d74-2kdcp 1/1 Running 4 49m
openshift-etcd etcd-okd4-snc-master.snc.test 3/3 Running 0 31m
openshift-etcd etcd-quorum-guard-6d84d68c47-2fctd 1/1 Running 0 39m
openshift-etcd etcd-quorum-guard-6d84d68c47-4znmk 0/1 Pending 0 39m
openshift-etcd etcd-quorum-guard-6d84d68c47-7qths 0/1 Pending 0 39m
openshift-etcd installer-2-okd4-snc-master.snc.test 0/1 Completed 0 43m
openshift-etcd installer-3-okd4-snc-master.snc.test 0/1 Completed 0 31m
openshift-etcd revision-pruner-2-okd4-snc-master.snc.test 0/1 Completed 0 40m
openshift-etcd revision-pruner-3-okd4-snc-master.snc.test 0/1 Completed 0 25m
openshift-image-registry cluster-image-registry-operator-bc6ff4b59-8t8wh 0/1 Preempting 0 49m
openshift-image-registry cluster-image-registry-operator-bc6ff4b59-zfhgb 1/1 Running 0 23m
openshift-image-registry node-ca-kmnfs 1/1 Running 0 31m
openshift-ingress-operator ingress-operator-6468c7f548-zzcfj 2/2 Running 0 49m
openshift-ingress router-default-b4b55dffd-sjss9 1/1 Running 1 41m
openshift-insights insights-operator-7d45f44d7b-tx6t2 1/1 Running 1 49m
openshift-kube-apiserver-operator kube-apiserver-operator-575f69c8dd-zq95h 1/1 Running 4 49m
openshift-kube-apiserver installer-2-okd4-snc-master.snc.test 0/1 Completed 0 42m
openshift-kube-apiserver installer-3-okd4-snc-master.snc.test 0/1 Completed 0 31m
openshift-kube-apiserver installer-4-okd4-snc-master.snc.test 0/1 Completed 0 24m
openshift-kube-apiserver kube-apiserver-okd4-snc-master.snc.test 5/5 Running 0 23m
openshift-kube-apiserver revision-pruner-2-okd4-snc-master.snc.test 0/1 Completed 0 39m
openshift-kube-apiserver revision-pruner-3-okd4-snc-master.snc.test 0/1 Completed 0 24m
openshift-kube-apiserver revision-pruner-4-okd4-snc-master.snc.test 0/1 Completed 0 18m
openshift-kube-controller-manager-operator kube-controller-manager-operator-59b9b989b6-gblgg 1/1 Running 4 49m
openshift-kube-controller-manager installer-4-okd4-snc-master.snc.test 0/1 Completed 0 42m
openshift-kube-controller-manager installer-5-okd4-snc-master.snc.test 0/1 Completed 0 41m
openshift-kube-controller-manager installer-6-okd4-snc-master.snc.test 0/1 Completed 0 40m
openshift-kube-controller-manager installer-7-okd4-snc-master.snc.test 0/1 Completed 0 25m
openshift-kube-controller-manager installer-8-okd4-snc-master.snc.test 0/1 Completed 0 24m
openshift-kube-controller-manager kube-controller-manager-okd4-snc-master.snc.test 4/4 Running 2 22m
openshift-kube-controller-manager revision-pruner-4-okd4-snc-master.snc.test 0/1 Completed 0 41m
openshift-kube-controller-manager revision-pruner-5-okd4-snc-master.snc.test 0/1 Completed 0 40m
openshift-kube-controller-manager revision-pruner-6-okd4-snc-master.snc.test 0/1 Completed 0 36m
openshift-kube-controller-manager revision-pruner-7-okd4-snc-master.snc.test 0/1 Completed 0 24m
openshift-kube-controller-manager revision-pruner-8-okd4-snc-master.snc.test 0/1 Completed 0 19m
openshift-kube-scheduler-operator openshift-kube-scheduler-operator-8ffc65794-l76v6 1/1 Running 4 49m
openshift-kube-scheduler installer-2-okd4-snc-master.snc.test 0/1 Completed 0 44m
openshift-kube-scheduler installer-3-okd4-snc-master.snc.test 0/1 Completed 0 43m
openshift-kube-scheduler installer-4-okd4-snc-master.snc.test 0/1 Completed 0 43m
openshift-kube-scheduler installer-5-okd4-snc-master.snc.test 0/1 Completed 0 41m
openshift-kube-scheduler installer-6-okd4-snc-master.snc.test 0/1 Completed 0 40m
openshift-kube-scheduler installer-7-okd4-snc-master.snc.test 0/1 Completed 0 25m
openshift-kube-scheduler installer-8-okd4-snc-master.snc.test 0/1 Completed 0 18m
openshift-kube-scheduler installer-9-okd4-snc-master.snc.test 0/1 Completed 0 17m
openshift-kube-scheduler openshift-kube-scheduler-okd4-snc-master.snc.test 3/3 Running 0 15m
openshift-kube-scheduler revision-pruner-2-okd4-snc-master.snc.test 0/1 Completed 0 43m
openshift-kube-scheduler revision-pruner-3-okd4-snc-master.snc.test 0/1 Completed 0 43m
openshift-kube-scheduler revision-pruner-4-okd4-snc-master.snc.test 0/1 Completed 0 41m
openshift-kube-scheduler revision-pruner-5-okd4-snc-master.snc.test 0/1 Completed 0 40m
openshift-kube-scheduler revision-pruner-6-okd4-snc-master.snc.test 0/1 Completed 0 36m
openshift-kube-scheduler revision-pruner-7-okd4-snc-master.snc.test 0/1 Completed 0 18m
openshift-kube-scheduler revision-pruner-8-okd4-snc-master.snc.test 0/1 Completed 0 17m
openshift-kube-scheduler revision-pruner-9-okd4-snc-master.snc.test 0/1 Completed 0 14m
openshift-kube-storage-version-migrator-operator kube-storage-version-migrator-operator-6b897b95f6-wwc7v 1/1 Running 4 49m
openshift-kube-storage-version-migrator migrator-7b994fb6bd-b5vb8 1/1 Running 0 44m
openshift-machine-api cluster-autoscaler-operator-78ccfd7fd9-kd92h 2/2 Running 3 49m
openshift-machine-api machine-api-operator-5b9c9dc55d-vpnl9 2/2 Running 3 49m
openshift-machine-config-operator machine-config-controller-7c99b744d9-5xm9p 1/1 Running 3 41m
openshift-machine-config-operator machine-config-daemon-s47c7 2/2 Running 0 45m
openshift-machine-config-operator machine-config-operator-659dfb74f6-brmsd 1/1 Running 3 49m
openshift-machine-config-operator machine-config-server-lztn6 1/1 Running 0 39m
openshift-marketplace community-operators-dlpx6 1/1 Running 1 41m
openshift-marketplace community-operators-vqxwt 0/1 ContainerCreating 0 3m32s
openshift-marketplace marketplace-operator-5c994cd68b-79rz8 1/1 Running 0 49m
openshift-monitoring alertmanager-main-0 5/5 Running 0 31m
openshift-monitoring alertmanager-main-1 5/5 Running 0 31m
openshift-monitoring alertmanager-main-2 5/5 Running 0 31m
openshift-monitoring cluster-monitoring-operator-5b5f7bbf6c-nq85m 2/2 Running 3 49m
openshift-monitoring grafana-59796b8d65-5k87f 2/2 Running 0 31m
openshift-monitoring kube-state-metrics-7c489bf449-55gjb 3/3 Running 0 44m
openshift-monitoring node-exporter-p7fsw 2/2 Running 0 44m
openshift-monitoring openshift-state-metrics-58cf4fc578-sb7r9 3/3 Running 0 44m
openshift-monitoring prometheus-adapter-99f7d69d8-nh9kb 1/1 Running 0 39m
openshift-monitoring prometheus-adapter-99f7d69d8-r576c 1/1 Running 0 39m
openshift-monitoring prometheus-k8s-0 6/6 Running 1 31m
openshift-monitoring prometheus-k8s-1 6/6 Running 1 31m
openshift-monitoring prometheus-operator-7464cb6789-d8tjp 2/2 Running 0 32m
openshift-monitoring thanos-querier-5665b7465b-9rpqs 5/5 Running 1 31m
openshift-monitoring thanos-querier-5665b7465b-lw4hn 5/5 Running 1 31m
openshift-multus multus-admission-controller-tsftl 2/2 Running 0 46m
openshift-multus multus-clz7t 1/1 Running 0 48m
openshift-multus network-metrics-daemon-khtvz 2/2 Running 0 48m
openshift-network-operator network-operator-df9d76cdb-vcq2f 1/1 Running 0 49m
openshift-oauth-apiserver apiserver-7fc5cf4fb6-2tqct 0/1 Completed 6 39m
openshift-operator-lifecycle-manager catalog-operator-5777d9f486-vt8wd 1/1 Running 0 49m
openshift-operator-lifecycle-manager olm-operator-64d68c47df-g8msw 1/1 Running 1 49m
openshift-operator-lifecycle-manager packageserver-58ddfdcc64-hs8mm 0/1 CrashLoopBackOff 10 41m
openshift-operator-lifecycle-manager packageserver-58ddfdcc64-xm8t6 0/1 CrashLoopBackOff 10 41m
openshift-sdn ovs-tw48s 1/1 Running 0 48m
openshift-sdn sdn-controller-wfjqh 1/1 Running 4 48m
openshift-sdn sdn-s7flk 2/2 Running 0 48m
openshift-service-ca-operator service-ca-operator-866b447b9d-6n24n 1/1 Running 4 49m
openshift-service-ca service-ca-78ff8d59b7-5pf7k 1/1 Running 3 44m
openshift-service-catalog-removed openshift-service-catalog-apiserver-remover-756tq 0/1 ContainerCreating 0 7m54s
openshift-service-catalog-removed openshift-service-catalog-controller-manager-remover-n4jt8 0/1 ContainerCreating 0 7m54s
htpasswd secret was also created:
[root@snc-host bin]# cat /root/okd4-snc/okd-creds/htpasswd
admin:$2y$05$52wLOSQrqPRmxRNawFCUdebHa6JcH8C/i9IoGrMLElZYrxHseBQR2
devuser:$apr1$D1D1ian9$ujbdocdsNDxOUoYyCDvFs/
Hello; I tried to login to the cluster, 10.106.31.233 is the master node IP address which I can ping, but got this:
[root@snc-host ~]# oc get pods --all-namespaces
The connection to the server api.okd4-snc.snc.test:6443 was refused - did you specify the right host or port?
[root@snc-host ~]# /root/bin/oc login -u admin
error: dial tcp 10.106.31.233:6443: connect: connection refused - verify you have provided the correct host and port and that the server is currently running..
[root@snc-host ~]# /root/bin/oc apply -f ${OKD4_SNC_PATH}/okd4-single-node-cluster/htpasswd-cr.yaml
error: unable to recognize "/root/okd4-snc/okd4-single-node-cluster/htpasswd-cr.yaml": Get "https://api.okd4-snc.snc.test:6443/api?timeout=32s": dial tcp 10.106.31.233:6443: connect: connection refused
I restarted the host but after it booted up, even "oc get pods --all-namespaces" command which ran before, didn't run after restart, and GUI didn't work too. I tried the curl and got this:
[root@snc-host ~]# curl https://console-openshift-console.apps.okd4-snc.snc.test
curl: (7) Failed to connect to console-openshift-console.apps.okd4-snc.snc.test port 443: Connection refused
Nothing works as expected. Did I miss something?
Hello; Nothing worked, so I clear everything and started from beginning 2 more times. At the end, none of my attempts succeeded. Despite that I waited more than a day, but both of the bootstrap and master nodes didn't shut down as you mentioned on the output of the "virsh list --all" command, which I think the reason was that the install process never finished at all and stuck somewhere. I tried "virsh console okd4-snc-master" and "virsh console okd4-snc-bootstrap" commands but nothing was displayed on the terminal. I followed every step you mentioned in this doc 3 times, but I wasn't successful at all.
Hey, that sucks that you couldn't get it working. I don't know if the nested virtualization is causing issues or not.
It looks like you got really close though. Those pods that were in a crash loop backoff state were probably the issue.
If you can get back to that state, I'll try to help you get across the line.
I'm in meetings all day most days, so I haven't been able to spend much time in my lab lately.. :-(
I really appreciate you for helping friends despite you're steeped in the daily work. So first of all I want to say thank you :)
I gave one more shot and noticed that there might be an issue in the "sshkey: ecdsa-sha1xxxxxx" line of the install-config-yaml file. I'm not sure if it makes any sense, I put the whole line inside ' ' characters (like sshkey: 'ecdsa-shaxxxxx' but got the same results as I said (bootstrap and master didn't shut down by script and I got nothing on the virsh console through the whole process), Because I still could ping the master node IP address and the console URL from other device, I tried to see if I could load the GUI, which I wasn't able too, although I got the connection refused error message through the "snc-host" CLI:
[root@snc-host ~]# curl https://console-openshift-console.apps.okd4-snc.snc.test
curl: (7) Failed to connect to console-openshift-console.apps.okd4-snc.snc.test port 443: Connection refused
I'll take a look at that. I might have a mistake in the document.
Did you get the chance to test-run the procedure on an ESXi host? I can even give you access to my lab if you interested to get info, whenever your time allows.
Hey. OKD 4.7 is being released. It might work better for you.
I have not had any testing time lately, so I haven't tried anything yet. I'll keep you posted.
Try this again with OKD 4.7.
Follow my guide, there are three oc commands that you need to run during the installation for it to complete successfully with a single node.
You will need to enable nested virtualization on your ESXi hypervisor, which I assume can be done, but haven't done it personally.
Give your snc-host 32GB of RAM, 4 or preferably 6 CPUs, and 500GB of disk. You can get away with less disk if needed.
Hi Sir,
May i know the minimum Disks for bootstrap and master node? In your scripts, it uses 200G for each node, can i use 30G for testing? i had tried and seemed not working.
I have successfully built with 50G disks.
This guide is also horribly out of date. I need to refresh it for OKD 4.9.
I'll try to spend a few minutes on it soon.
I have successfully built with 50G disks.
This guide is also horribly out of date. I need to refresh it for OKD 4.9.
I'll try to spend a few minutes on it soon.
thank you , please let me know the result. I still use OKD 4.7 but withe the updatest version: 4.7.0-0.okd-2021-09-19-013247. I also use the updating FCOS version: 34.20210611.3.0
Still have issues with bootstrap node, find some no space errors in dmesg, but not sure whether it is caused by disk.
If it is disk issues, i will create another VM for testing but i do have disk resources limitations in my notebook.
How much memory are you giving the bootstrap node?
It needs at least 14GB RAM. Not because it uses that much, but because it sizes the RAM disk that it expands the ostree bundle into proportionally to the RAM. If it's too small, then the ostree bundle will fill up the filesystem.
It should be safe to oversubscribe your RAM if you have at least 32 GB on the box. So, try giving the bootstrap node 16GB of RAM.
I only has 16G memory in my notebook, seems I can overconfigure memory for bootstrap node!
Than how about Master? Same requirements?
Thanks Sky Sent from my mobile
On Dec 23, 2021, at 7:22 PM, Charro Gruver @.***> wrote:
It should be safe to oversubscribe your RAM if you have at least 32 GB on the box. So, try giving the bootstrap node 16GB of RAM.
— Reply to this email directly, view it on GitHubhttps://github.com/cgruver/okd4-single-node-cluster/issues/8#issuecomment-1000236168, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AVKOO3LE7NFVDWVDVCFMRY3USMA7NANCNFSM4WZRXIAQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you commented.Message ID: @.***>
Yeah, 16GB of RAM is going to be really tight. Ideally, you need 32GB of RAM to have a useful setup.
yes, seems still can cheat KVM with 16G memory even only having 12G memory in the host. Bootstrap should be fine but still have some issues with Master, i will try to solve it tomorrow.
bookstrip node works fine but not master node. I use virsh console to login master, always report below errors:
[ 2001.156252] ignition[452]: GET error: Get "https://api-int.okd4-snc.snc.test:22623[* ] A start job is running for Ignition (fetch) (32min 59s / no limit) [ 2006.202949] ignition[452]: GET https://api-int.okd4-snc.snc.test:22623/config/master: attempt #377
I also can't use ssh core@okd4-snc-master, so I guess maybe I need real "16G" memory to create master node.
Anyway, many thanks for your support, I will try to install it after i got power server.
just found api-int.okd4-snc.snc.test is in boostrip node, so this problem should be on bootstrip. Do you know how to check the web service for port 22623? i am sure there is no program listen to that port.
That error message is to be expected until the bootstrap node is fully up and serving the manifests for open shift install.
Once the bootstrap node is fully up, it will be listening on port 22623 to serve the API for the master node to install.
The master node will keep retrying that HTTP get until it is able to retrieve its ignition file.
So, I believe that your bootstrap node has not completely started up yet.
You definitely need more RAM. That's why I use the little Intel NUC servers. They are perfect for this, and really portable.
yes, you are right, i found that kubelet.service could not be started in bootstrap node.
I also use the newest version of OKD: 4.9.0-0.okd-2021-12-12-025847
ec 24 13:15:55 okd4-snc-bootstrap.snc.test systemd[1]: kubelet.service: Scheduled restart job, restart counter is at 4. Dec 24 13:15:55 okd4-snc-bootstrap.snc.test systemd[1]: Stopped Kubernetes Kubelet. Dec 24 13:15:55 okd4-snc-bootstrap.snc.test systemd[1]: kubelet.service: Consumed 21.956s CPU time. Dec 24 13:15:55 okd4-snc-bootstrap.snc.test systemd[1]: Starting Kubernetes Kubelet... Dec 24 13:17:27 okd4-snc-bootstrap.snc.test systemd[1]: kubelet.service: start-pre operation timed out. Terminating. Dec 24 13:17:37 okd4-snc-bootstrap.snc.test systemd[1]: kubelet.service: Control process exited, code=killed, status=15/TERM Dec 24 13:17:37 okd4-snc-bootstrap.snc.test systemd[1]: kubelet.service: Failed with result 'timeout'. Dec 24 13:17:37 okd4-snc-bootstrap.snc.test systemd[1]: Failed to start Kubernetes Kubelet. Dec 24 13:17:37 okd4-snc-bootstrap.snc.test systemd[1]: kubelet.service: Consumed 22.829s CPU time.