cozystack icon indicating copy to clipboard operation
cozystack copied to clipboard

Failed to bootstrap talos linux with Talm in Proxmox VE

Open vkobazev opened this issue 9 months ago • 10 comments

Hello guys! I have a problem with bootstrapping talos linux for cozystack and will be grateful for any advice.

I've set up VM in Proxmox environment using my own ISO image build with Talos Factory. To prepare Talos for Cozystack deployment I've used Talm. Here is my config:

# talm: nodes=["192.168.50.95"], endpoints=["192.168.50.95"], templates=["templates/controlplane.yaml"]
machine:
  type: controlplane
  kubelet:
    extraConfig:
      maxPods: 512
    nodeIP:
      validSubnets:
        - 192.168.100.0/24
  network:
    hostname: talos-6b933
    # -- Discovered interfaces:
    # enxbc241161d0ad:
    #   id: eth0
    #   hardwareAddr:bc:24:11:61:d0:ad
    #   busPath: 0000:00:12.0
    #   driver: virtio_net
    #   vendor: Red Hat, Inc.
    #   product: Virtio network device)
    interfaces:
      - deviceSelector:
          busPath: "0000:00:12.0"
        addresses:
          - 192.168.50.95/24
        routes:
          - network: 0.0.0.0/0
            gateway: 192.168.50.1
        vip:
          ip: 192.168.100.10
    nameservers:
      - 192.168.50.2
      - 77.88.8.8
  install:
    image: ghcr.io/aenix-io/cozystack/talos:v1.9.3
  files:
    - content: |
        [plugins]
          [plugins."io.containerd.grpc.v1.cri"]
            device_ownership_from_security_context = true
          [plugins."io.containerd.cri.v1.runtime"]
            device_ownership_from_security_context = true
      permissions: 0o0
      path: /etc/cri/conf.d/20-customization.part
      op: create
  kernel:
    modules:
      - name: openvswitch
      - name: drbd
        parameters:
          - usermode_helper=disabled
      - name: zfs
      - name: spl
  nodeLabels:
    node.kubernetes.io/exclude-from-external-load-balancers:
      $patch: delete
cluster:
  controlPlane:
    endpoint: https://192.168.100.10:6443
  clusterName: cloud-msk
  network:
    cni:
      name: none
    dnsDomain: cozy.local
    serviceSubnets:
      - 10.96.0.0/16
  apiServer:
    certSANs:
      - 127.0.0.1
  controllerManager:
    extraArgs:
      bind-address: 0.0.0.0
  proxy:
    disabled: true
  scheduler:
    extraArgs:
      bind-address: 0.0.0.0
  discovery:
    enabled: false
  etcd:
    advertisedSubnets:
      - 192.168.100.0/24
  allowSchedulingOnControlPlanes: true

When I'm trying to apply this config with talm apply -f nodes/srv1.yaml -i I receive the following log: Image

Talos factory image packages: Image

Proxmox VM Hardware: Image

I don't understand what's the problem with my config or VM's setup, maybe I'm missing something in my Talos image? Which packages are required for building my own image to bootstrap talos with Talm?

Appreciate any help!

vkobazev avatar Mar 28 '25 17:03 vkobazev

Hey, do you have zfs and drbd modules in your image?

kvaps avatar Mar 28 '25 18:03 kvaps

Hi,

first screenshot says no system disk found.

Your talm node install section seems to be missing the disk option?

  install:
    disk: /dev/sda
    image: ghcr.io/aenix-io/cozystack/talos:v1.9.3

Usually this should happen automatically by talm which would add something like this:

  install:
    # -- Discovered disks:
    disk: /dev/sda

I'm also on Proxmox, but using the Cozystack Image for setup (only missing the qemu-guest-agent extension for now).

In my case talm could not discover the disks on Proxmox, so I used talosctl to inspect the disks and added the disk option myself.

talosctl get disks --nodes node.ip --endpoints node.ip --insecure

adoerler avatar Mar 28 '25 20:03 adoerler

Hey, do you have zfs and drbd modules in your image?

Ye, all them installed

vkobazev avatar Mar 29 '25 15:03 vkobazev

I'm also on Proxmox, but using the Cozystack Image for setup (only missing the qemu-guest-agent extension for now).

In my case talm could not discover the disks on Proxmox, so I used talosctl to inspect the disks and added the disk option myself.

talosctl get disks --nodes node.ip --endpoints node.ip --insecure

@adoerler Thanks!! It's helped me for installation, but other problems started VM started, but kubelet is dead

Logs:

Image

Image

Image

Status bar:

Image

vkobazev avatar Mar 29 '25 15:03 vkobazev

@vkobazev

VM started, but kubelet is dead

do you have routing between the two networks .50 and .100? If not you should configure validSubnes and vip in the same net as your nodes are.

adoerler avatar Mar 29 '25 15:03 adoerler

@vkobazev, how's it going? Could you resolve the problem?

NickVolynkin avatar Apr 21 '25 09:04 NickVolynkin

For my part I was using Virtio driver for the system disk, (added as /dev/vda) and I changed to isci instead so that the path become /dev/sda as default. This would solve my problem in either case

  install:
    # -- Discovered disks:
    disk: /dev/sda

lb0o avatar Apr 23 '25 14:04 lb0o

Sorry guys (@NickVolynkin @adoerler @kvaps ), it was troubled times last month So, I come back with error Something like └─[$] talm template -e 192.168.50.95 -n 192.168.50.95 -t templates/controlplane.yaml -i > nodes/srv1.yaml [17:40:14] failed to render templates: template: cloud-msk-2/templates/worker.yaml:2:4: executing "cloud-msk-2/templates/worker.yaml" at <include "talos.config" .>: error calling include: template: cloud-msk-2/templates/_helpers.tpl:48:9: executing "talos.config" at <include "talm.discovered.disks_info" .>: error calling include: template: cloud-msk-2/charts/talm/templates/_helpers.tpl:32:11: executing "talm.discovered.disks_info" at <lookup "disks" "" "">: error calling lookup: rpc error: code = Unavailable desc = connection error: desc = "error reading server preface: remote error: tls: certificate required"

I migrated to the new pc and installed the last version of talm, but VM in Proxmox has older version - 1.9.3 I suppose that I need to refresh certs, but couldn't find any information about it in talm docs

vkobazev avatar Apr 28 '25 14:04 vkobazev

Not sure of the problems you are facing.. Under proxmox vm > hardware could you check if your node is in a maintenance mode in the talos console ? Also what driver is used on hard disks ? scsi ? Is your kubeconfig original still accessible in your initial environment ?

lb0o avatar Apr 28 '25 16:04 lb0o

Hi @vkobazev

I migrated to the new pc and installed the last version of talm, but VM in Proxmox has older version - 1.9.3 I suppose that I need to refresh certs, but couldn't find any information about it in talm docs

so you have a new pc, you setup talm but you kept the existing cluster in your proxmox environment?

Did you migrate your talm setup folder including the secrets.yaml from your old pc to the new one?

If the nodes are still running from your last installation attempt you have to make sure talm uses the original certificates when issuing talosctl.

adoerler avatar Apr 28 '25 16:04 adoerler

Hi, @vkobazev. I'm Dosu, and I'm helping the cozystack team manage their backlog and am marking this issue as stale.

Issue Summary:

  • You initially faced disk discovery issues bootstrapping Talos Linux with Talm on Proxmox VE using a custom ISO and Talos config.
  • Manually specifying the disk in the install section resolved the disk detection problem.
  • Kubelet failed to start afterward, with network routing and disk driver settings discussed as potential causes.
  • After migrating to a new PC and updating talm, you encountered TLS certificate errors likely due to version mismatches and missing secrets.yaml.
  • I advised ensuring original certificates are used for talosctl commands to avoid TLS issues.

Next Steps:

  • Please confirm if this issue is still relevant with the latest version of the cozystack repository by commenting here.
  • If no further updates are provided, I will automatically close this issue in 7 days.

Thank you for your understanding and contribution!

dosubot[bot] avatar Oct 20 '25 16:10 dosubot[bot]