Failed to bootstrap talos linux with Talm in Proxmox VE
Hello guys! I have a problem with bootstrapping talos linux for cozystack and will be grateful for any advice.
I've set up VM in Proxmox environment using my own ISO image build with Talos Factory. To prepare Talos for Cozystack deployment I've used Talm. Here is my config:
# talm: nodes=["192.168.50.95"], endpoints=["192.168.50.95"], templates=["templates/controlplane.yaml"]
machine:
type: controlplane
kubelet:
extraConfig:
maxPods: 512
nodeIP:
validSubnets:
- 192.168.100.0/24
network:
hostname: talos-6b933
# -- Discovered interfaces:
# enxbc241161d0ad:
# id: eth0
# hardwareAddr:bc:24:11:61:d0:ad
# busPath: 0000:00:12.0
# driver: virtio_net
# vendor: Red Hat, Inc.
# product: Virtio network device)
interfaces:
- deviceSelector:
busPath: "0000:00:12.0"
addresses:
- 192.168.50.95/24
routes:
- network: 0.0.0.0/0
gateway: 192.168.50.1
vip:
ip: 192.168.100.10
nameservers:
- 192.168.50.2
- 77.88.8.8
install:
image: ghcr.io/aenix-io/cozystack/talos:v1.9.3
files:
- content: |
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
device_ownership_from_security_context = true
[plugins."io.containerd.cri.v1.runtime"]
device_ownership_from_security_context = true
permissions: 0o0
path: /etc/cri/conf.d/20-customization.part
op: create
kernel:
modules:
- name: openvswitch
- name: drbd
parameters:
- usermode_helper=disabled
- name: zfs
- name: spl
nodeLabels:
node.kubernetes.io/exclude-from-external-load-balancers:
$patch: delete
cluster:
controlPlane:
endpoint: https://192.168.100.10:6443
clusterName: cloud-msk
network:
cni:
name: none
dnsDomain: cozy.local
serviceSubnets:
- 10.96.0.0/16
apiServer:
certSANs:
- 127.0.0.1
controllerManager:
extraArgs:
bind-address: 0.0.0.0
proxy:
disabled: true
scheduler:
extraArgs:
bind-address: 0.0.0.0
discovery:
enabled: false
etcd:
advertisedSubnets:
- 192.168.100.0/24
allowSchedulingOnControlPlanes: true
When I'm trying to apply this config with talm apply -f nodes/srv1.yaml -i I receive the following log:
Talos factory image packages:
Proxmox VM Hardware:
I don't understand what's the problem with my config or VM's setup, maybe I'm missing something in my Talos image? Which packages are required for building my own image to bootstrap talos with Talm?
Appreciate any help!
Hey, do you have zfs and drbd modules in your image?
Hi,
first screenshot says no system disk found.
Your talm node install section seems to be missing the disk option?
install:
disk: /dev/sda
image: ghcr.io/aenix-io/cozystack/talos:v1.9.3
Usually this should happen automatically by talm which would add something like this:
install:
# -- Discovered disks:
disk: /dev/sda
I'm also on Proxmox, but using the Cozystack Image for setup (only missing the qemu-guest-agent extension for now).
In my case talm could not discover the disks on Proxmox, so I used talosctl to inspect the disks and added the disk option myself.
talosctl get disks --nodes node.ip --endpoints node.ip --insecure
Hey, do you have
zfsanddrbdmodules in your image?
Ye, all them installed
I'm also on Proxmox, but using the Cozystack Image for setup (only missing the qemu-guest-agent extension for now).
In my case talm could not discover the disks on Proxmox, so I used talosctl to inspect the disks and added the disk option myself.
talosctl get disks --nodes node.ip --endpoints node.ip --insecure
@adoerler Thanks!! It's helped me for installation, but other problems started VM started, but kubelet is dead
Logs:
Status bar:
@vkobazev
VM started, but kubelet is dead
do you have routing between the two networks .50 and .100? If not you should configure validSubnes and vip in the same net as your nodes are.
@vkobazev, how's it going? Could you resolve the problem?
For my part I was using Virtio driver for the system disk, (added as /dev/vda) and I changed to isci instead so that the path become /dev/sda as default. This would solve my problem in either case
install: # -- Discovered disks: disk: /dev/sda
Sorry guys (@NickVolynkin @adoerler @kvaps ), it was troubled times last month
So, I come back with error
Something like
└─[$] talm template -e 192.168.50.95 -n 192.168.50.95 -t templates/controlplane.yaml -i > nodes/srv1.yaml [17:40:14] failed to render templates: template: cloud-msk-2/templates/worker.yaml:2:4: executing "cloud-msk-2/templates/worker.yaml" at <include "talos.config" .>: error calling include: template: cloud-msk-2/templates/_helpers.tpl:48:9: executing "talos.config" at <include "talm.discovered.disks_info" .>: error calling include: template: cloud-msk-2/charts/talm/templates/_helpers.tpl:32:11: executing "talm.discovered.disks_info" at <lookup "disks" "" "">: error calling lookup: rpc error: code = Unavailable desc = connection error: desc = "error reading server preface: remote error: tls: certificate required"
I migrated to the new pc and installed the last version of talm, but VM in Proxmox has older version - 1.9.3 I suppose that I need to refresh certs, but couldn't find any information about it in talm docs
Not sure of the problems you are facing.. Under proxmox vm > hardware could you check if your node is in a maintenance mode in the talos console ? Also what driver is used on hard disks ? scsi ? Is your kubeconfig original still accessible in your initial environment ?
Hi @vkobazev
I migrated to the new pc and installed the last version of talm, but VM in Proxmox has older version - 1.9.3 I suppose that I need to refresh certs, but couldn't find any information about it in talm docs
so you have a new pc, you setup talm but you kept the existing cluster in your proxmox environment?
Did you migrate your talm setup folder including the secrets.yaml from your old pc to the new one?
If the nodes are still running from your last installation attempt you have to make sure talm uses the original certificates when issuing talosctl.
Hi, @vkobazev. I'm Dosu, and I'm helping the cozystack team manage their backlog and am marking this issue as stale.
Issue Summary:
- You initially faced disk discovery issues bootstrapping Talos Linux with Talm on Proxmox VE using a custom ISO and Talos config.
- Manually specifying the disk in the install section resolved the disk detection problem.
- Kubelet failed to start afterward, with network routing and disk driver settings discussed as potential causes.
- After migrating to a new PC and updating talm, you encountered TLS certificate errors likely due to version mismatches and missing secrets.yaml.
- I advised ensuring original certificates are used for talosctl commands to avoid TLS issues.
Next Steps:
- Please confirm if this issue is still relevant with the latest version of the cozystack repository by commenting here.
- If no further updates are provided, I will automatically close this issue in 7 days.
Thank you for your understanding and contribution!