cluster-api-provider-harvester
cluster-api-provider-harvester copied to clipboard
New cluster using Talos is not progressing beyond Machines in Provisioning stage.
What happened: [A clear and concise description of what the bug is.]
The cluster is not coming up, Harvester Loadbalancer is not created, machines never leave provisioning state. The machines is provisioned in harvester, gets IP from my network. I can attach a console to them. Though its Talos so its not much you get in return.
Screenshot of console of one of the talos cp vms:
caph-provider logs:
ERROR failed to patch HarvesterMachine {"controller": "harvestermachine", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "HarvesterMachine", "HarvesterMachine": {"name":"capi-mgmt-p-01-zzmph","namespace":"cluster-capi-mgmt-p-01"}, "namespace": "cluster-capi-mgmt-p-01", "name": "capi-mgmt-p-01-zzmph", "reconcileID": "7ec120a6-8a1e-40b1-98dd-3597ce44ca1c", "machine": "cluster-capi-mgmt-p-01/capi-mgmt-p-01-7shhp", "cluster": "cluster-capi-mgmt-p-01/capi-mgmt-p-01", "error": "HarvesterMachine.infrastructure.cluster.x-k8s.io \"capi-mgmt-p-01-zzmph\" is invalid: ready: Required value", "errorCauses": [{"error": "HarvesterMachine.infrastructure.cluster.x-k8s.io \"capi-mgmt-p-01-zzmph\" is invalid: ready: Required value"}]}
github.com/rancher-sandbox/cluster-api-provider-harvester/controllers.(*HarvesterMachineReconciler).Reconcile.func1
/workspace/controllers/harvestermachine_controller.go:121
github.com/rancher-sandbox/cluster-api-provider-harvester/controllers.(*HarvesterMachineReconciler).Reconcile
/workspace/controllers/harvestermachine_controller.go:198
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:118
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:314
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:265
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:226
2024-06-06T19:58:10Z ERROR Reconciler error {"controller": "harvestermachine", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "HarvesterMachine", "HarvesterMachine": {"name":"capi-mgmt-p-01-zzmph","namespace":"cluster-capi-mgmt-p-01"}, "namespace": "cluster-capi-mgmt-p-01", "name": "capi-mgmt-p-01-zzmph", "reconcileID": "7ec120a6-8a1e-40b1-98dd-3597ce44ca1c", "error": "HarvesterMachine.infrastructure.cluster.x-k8s.io \"capi-mgmt-p-01-zzmph\" is invalid: ready: Required value", "errorCauses": [{"error": "HarvesterMachine.infrastructure.cluster.x-k8s.io \"capi-mgmt-p-01-zzmph\" is invalid: ready: Required value"}]}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:324
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:265
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:226
- These two log entries keeps going.
2024-06-06T19:58:10Z INFO Reconciling HarvesterMachine ... {"controller": "harvestermachine", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "HarvesterMachine", "HarvesterMachine": {"name":"capi-mgmt-p-01-zzmph","namespace":"cluster-capi-mgmt-p-01"}, "namespace": "cluster-capi-mgmt-p-01", "name": "capi-mgmt-p-01-zzmph", "reconcileID": "dc815768-5306-42cc-91c0-be802d85bc82"}
2024-06-06T19:58:10Z INFO Waiting for ProviderID to be set on Node resource in Workload Cluster ... {"controller": "harvestermachine", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "HarvesterMachine", "HarvesterMachine": {"name":"capi-mgmt-p-01-zzmph","namespace":"cluster-capi-mgmt-p-01"}, "namespace": "cluster-capi-mgmt-p-01", "name": "capi-mgmt-p-01-zzmph", "reconcileID": "dc815768-5306-42cc-91c0-be802d85bc82", "machine": "cluster-capi-mgmt-p-01/capi-mgmt-p-01-7shhp", "cluster": "cluster-capi-mgmt-p-01/capi-mgmt-p-01"}
capt-controller-manager logs:
I0606 19:58:08.737945 1 taloscontrolplane_controller.go:176] "controllers/TalosControlPlane: successfully updated control plane status" namespace="cluster-capi-mgmt-p-01" talosControlPlane="capi-mgmt-p-01" cluster="capi-mgmt-p-01"
I0606 19:58:08.739615 1 controller.go:327] "Warning: Reconciler returned both a non-zero result and a non-nil error. The result will always be ignored if the error is non-nil and the non-nil error causes reqeueuing with exponential backoff. For more details, see: https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/reconcile#Reconciler" controller="taloscontrolplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="TalosControlPlane" TalosControlPlane="cluster-capi-mgmt-p-01/capi-mgmt-p-01" namespace="cluster-capi-mgmt-p-01" name="capi-mgmt-p-01" reconcileID="b0b79408-8a41-43df-91ef-07fe7d36fa7c"
E0606 19:58:08.739746 1 controller.go:329] "Reconciler error" err="at least one machine should be provided" controller="taloscontrolplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="TalosControlPlane" TalosControlPlane="cluster-capi-mgmt-p-01/capi-mgmt-p-01" namespace="cluster-capi-mgmt-p-01" name="capi-mgmt-p-01" reconcileID="b0b79408-8a41-43df-91ef-07fe7d36fa7c"
I0606 19:58:08.749008 1 taloscontrolplane_controller.go:189] "reconcile TalosControlPlane" controller="taloscontrolplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="TalosControlPlane" TalosControlPlane="cluster-capi-mgmt-p-01/capi-mgmt-p-01" namespace="cluster-capi-mgmt-p-01" name="capi-mgmt-p-01" reconcileID="c37dc309-f8fb-42c7-a375-5faceb9019b9" cluster="capi-mgmt-p-01"
I0606 19:58:09.190175 1 scale.go:33] "controllers/TalosControlPlane: scaling up control plane" Desired=3 Existing=1
I0606 19:58:09.213294 1 taloscontrolplane_controller.go:152] "controllers/TalosControlPlane: attempting to set control plane status"
I0606 19:58:09.220900 1 taloscontrolplane_controller.go:564] "controllers/TalosControlPlane: failed to get kubeconfig for the cluster" error="failed to create cluster accessor: error creating client for remote cluster \"cluster-capi-mgmt-p-01/capi-mgmt-p-01\": error getting rest mapping: failed to get API group resources: unable to retrieve the complete list of server APIs: v1: Get \"https://10.0.0.113:6443/api/v1?timeout=10s\": tls: failed to verify certificate: x509: certificate is valid for 10.0.0.3, 127.0.0.1, ::1, 10.0.0.5, 10.53.0.1, not 10.0.0.113"
cabpt-talos-bootstrap(I dont know if this is relevant):
I0606 19:58:09.206570 1 talosconfig_controller.go:186] "controllers/TalosConfig/cabpt-controller/namespace=cluster-capi-mgmt-p-01/talosconfig=capi-mgmt-p-01-npzm4: Waiting for OwnerRef on the talosconfig"
I0606 19:58:09.224117 1 talosconfig_controller.go:186] "controllers/TalosConfig/cabpt-controller/namespace=cluster-capi-mgmt-p-01/talosconfig=capi-mgmt-p-01-npzm4: Waiting for OwnerRef on the talosconfig"
I0606 19:58:09.243118 1 talosconfig_controller.go:186] "controllers/TalosConfig/cabpt-controller/namespace=cluster-capi-mgmt-p-01/talosconfig=capi-mgmt-p-01-npzm4: Waiting for OwnerRef on the talosconfig"
I0606 19:58:09.280372 1 talosconfig_controller.go:186] "controllers/TalosConfig/cabpt-controller/namespace=cluster-capi-mgmt-p-01/talosconfig=capi-mgmt-p-01-npzm4: Waiting for OwnerRef on the talosconfig"
I0606 19:58:09.341804 1 talosconfig_controller.go:186] "controllers/TalosConfig/cabpt-controller/namespace=cluster-capi-mgmt-p-01/talosconfig=capi-mgmt-p-01-df9f2: Waiting for OwnerRef on the talosconfig"
I0606 19:58:09.352557 1 talosconfig_controller.go:186] "controllers/TalosConfig/cabpt-controller/namespace=cluster-capi-mgmt-p-01/talosconfig=capi-mgmt-p-01-df9f2: Waiting for OwnerRef on the talosconfig"
I0606 19:58:09.439369 1 talosconfig_controller.go:186] "controllers/TalosConfig/cabpt-controller/namespace=cluster-capi-mgmt-p-01/talosconfig=capi-mgmt-p-01-df9f2: Waiting for OwnerRef on the talosconfig"
I0606 19:58:09.480714 1 talosconfig_controller.go:186] "controllers/TalosConfig/cabpt-controller/namespace=cluster-capi-mgmt-p-01/talosconfig=capi-mgmt-p-01-df9f2: Waiting for OwnerRef on the talosconfig"
I0606 19:58:09.539945 1 talosconfig_controller.go:186] "controllers/TalosConfig/cabpt-controller/namespace=cluster-capi-mgmt-p-01/talosconfig=capi-mgmt-p-01-df9f2: Waiting for OwnerRef on the talosconfig"
I0606 19:58:09.548156 1 secrets.go:174] "controllers/TalosConfig: handling bootstrap data for " owner="capi-mgmt-p-01-n48cx"
I0606 19:58:09.717884 1 secrets.go:174] "controllers/TalosConfig: handling bootstrap data for " owner="capi-mgmt-p-01-n48cx"
I0606 19:58:09.720944 1 secrets.go:174] "controllers/TalosConfig: handling bootstrap data for " owner="capi-mgmt-p-01-7shhp"
I0606 19:58:09.756344 1 talosconfig_controller.go:223] "controllers/TalosConfig/cabpt-controller/namespace=cluster-capi-mgmt-p-01/talosconfig=capi-mgmt-p-01-npzm4/owner-name=capi-mgmt-p-01-n48cx: ignoring an already ready config"
I0606 19:58:09.765995 1 secrets.go:243] "controllers/TalosConfig/cabpt-controller/namespace=cluster-capi-mgmt-p-01/talosconfig=capi-mgmt-p-01-npzm4/owner-name=capi-mgmt-p-01-n48cx: updating talosconfig" endpoints=null secret="capi-mgmt-p-01-talosconfig"
What did you expect to happen: I expected that the caph provider created the LB and proceeded on creating the cluster.
How to reproduce it:
I added the providers for talos (boostrap and controlplane) and ofcourse the harvester provider.
Added 4 files + the harvester secret with the following configuration:
cluster.yaml:
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: capi-mgmt-p-01
namespace: cluster-capi-mgmt-p-01
spec:
clusterNetwork:
pods:
cidrBlocks:
- 172.16.0.0/20
services:
cidrBlocks:
- 172.16.16.0/20
serviceDomain: cluster.local
controlPlaneRef:
apiVersion: controlplane.cluster.x-k8s.io/v1alpha3
kind: TalosControlPlane
name: capi-mgmt-p-01
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: HarvesterCluster
name: capi-mgmt-p-01
harvester-cluster.yaml:
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: HarvesterCluster
metadata:
name: capi-mgmt-p-01
namespace: cluster-capi-mgmt-p-01
spec:
targetNamespace: cluster-capi-mgmt-p-01
loadBalancerConfig:
ipamType: pool
ipPoolRef: k8s-api
server: https://10.0.0.3
identitySecret:
name: trollit-harvester-secret
namespace: cluster-capi-mgmt-p-01
harvester-machinetemplate.yaml:
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: HarvesterMachineTemplate
metadata:
name: capi-mgmt-p-01
namespace: cluster-capi-mgmt-p-01
spec:
template:
spec:
cpu: 2
memory: 8Gi
sshUser: ubuntu
sshKeyPair: default/david
networks:
- cluster-capi-mgmt-p-01/capi-mgmt-network
volumes:
- volumeType: image
imageName: harvester-public/talos-1.7.4-metalqemu
volumeSize: 50Gi
bootOrder: 0
controlplane.yaml:
apiVersion: controlplane.cluster.x-k8s.io/v1alpha3
kind: TalosControlPlane
metadata:
name: capi-mgmt-p-01
namespace: cluster-capi-mgmt-p-01
spec:
version: "v1.30.0"
replicas: 3
infrastructureTemplate:
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: HarvesterMachineTemplate
name: capi-mgmt-p-01
controlPlaneConfig:
controlplane:
generateType: controlplane
talosVersion: v1.7.4
configPatches:
- op: add
path: /cluster/network
value:
cni:
name: none
- op: add
path: /cluster/proxy
value:
disabled: true
- op: add
path: /cluster/network/podSubnets
value:
- 172.16.0.0/20
- op: add
path: /cluster/network/serviceSubnets
value:
- 172.16.16.0/20
- op: add
path: /machine/kubelet/extraArgs
value:
cloud-provider: external
- op: add
path: /machine/kubelet/nodeIP
value:
validSubnets:
- 10.0.0.0/24
- op: add
path: /cluster/discovery
value:
enabled: false
- op: add
path: /machine/features/kubePrism
value:
enabled: true
- op: add
path: /cluster/apiServer/certSANs
value:
- 127.0.0.1
- op: add
path: /cluster/apiServer/extraArgs
value:
anonymous-auth: true
Anything else you would like to add:
I have tried to switch the Loadbalancer config from dhcp to ipPoolRef, and set a pre-configured ippool this also did not work. I think its related to that the LB is never provisioned in the first place.
[Miscellaneous information that will assist in solving the issue.]
Environment:
- talos controlplane provider version: 0.5.5
- talos bootstrap provider version: 0.6.4
- harvester cluster api provider: 0.1.2
- harvester version installed on my HP server: 1.3.0
- OS (e.g. from
/etc/os-release
):