talm icon indicating copy to clipboard operation
talm copied to clipboard

Bootstrap failing with `failed to verify certificate: x509`

Open Nikola-Milovic opened this issue 7 months ago • 3 comments

I would've submitted a discussion since its more of an question rather than an issue

($cp1ip is just shorthand for 192.168.102.x)

This is running on vagrant and libvirt, bootstraping a cluster via talosctl when using talosctl gen config works fine, but I wanted to give talm a try

What I did

  1. talm init
  2. Setup the talosconfig and the endpoint
> export TALOSCONFIG=$(realpath ./talosconfig)
> talosctl config endpoint $cp1ip
  1. Create templates
> talm template -e $cp1ip -n $cp1ip -t templates/controlplane.yaml -i > nodes/cp1.yaml
> talm template -e $w1ip -n $w1ip -t templates/worker.yaml -i > nodes/w1.yaml
  1. Apply configs
 > talm apply -f ./nodes/cp1.yaml -i
- talm: file=./nodes/cp1.yaml, nodes=[192.168.121.90], endpoints=[192.168.121.90]
> talm apply -f ./nodes/w1.yaml -i
- talm: file=./nodes/w1.yaml, nodes=[192.168.121.6], endpoints=[192.168.121.6]
  1. Bootstrap
> talm bootstrap -f ./nodes/cp1.yaml
error executing bootstrap: rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake
 failed: tls: failed to verify certificate: x509: certificate signed by unknown authority" 

Nikola-Milovic avatar May 28 '25 15:05 Nikola-Milovic

Hey, Talm is storing talosconfig in the same directroy with the project, you can copy your talosconfig into root project directory, or override path in Chartt.yaml

kvaps avatar May 29 '25 06:05 kvaps

@kvaps I believe that is the case

>  tree
.
├── charts
│   └── talm
│       ├── Chart.yaml
│       └── templates
│           └── _helpers.tpl
├── Chart.yaml
├── flake.lock
├── flake.nix
├── justfile
├── nodes
│   ├── cp1.yaml
│   └── w1.yaml
├── patch.yaml
├── secrets.yaml
├── talm
├── talosconfig
├── templates
│   ├── controlplane.yaml
│   ├── _helpers.tpl
│   └── worker.yaml
├── Vagrantfile
└── values.yaml

I'll redo it again

  1. vagrant up (after wgeting 1.10 talos image)
> ./talm init
Created secrets.yaml
generating PKI and tokens
Created talosconfig
Created values.yaml
Created charts/talm/Chart.yaml
Created charts/talm/templates/_helpers.tpl
Created Chart.yaml
Created templates/_helpers.tpl
Created templates/controlplane.yaml
Created templates/worker.yaml
talm template -e $cp1ip -n $cp1ip -t templates/controlplane.yaml -i > nodes/cp1.yaml
> ./talm apply -f ./nodes/cp1.yaml -i
- talm: file=./nodes/cp1.yaml, nodes=[192.168.121.54], endpoints=[192.168.121.54]
> ./talm bootstrap -f ./nodes/cp1.yaml
error executing bootstrap: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dia
l tcp 192.168.121.54:50000: connect: connection refused"
Vagrant.configure("2") do |config|
  # Define resources for the VMs
  resources = {
    "control-plane-node-1" => {
      ip: "192.168.56.100",
      cpus: 2,
      memory: 4096,
      disk_size: '16G',
    },
    "worker-node-1" => {
      ip: "192.168.56.103",
      cpus: 2,
      memory: 4096,
      disk_size: '16G',
    }
  }

  # Loop through resources and configure VMs
  resources.each do |name, config_data|
    config.vm.define name do |node|
      # TODO: fix this, talos maybe doesn't respect the setting
      # node.vm.network :private_network, :ip => config_data[:ip]
      node.vm.provider :libvirt do |domain|
        domain.cpus = config_data[:cpus]
        domain.memory = config_data[:memory]
        domain.storage :file, device: :cdrom, path: "/tmp/metal-amd64.iso"
        domain.storage :file, size: config_data[:disk_size], type: 'raw'
        domain.storage :file, device: "vdc", type: 'raw'
        domain.boot 'hd'
        domain.boot 'cdrom'
      end
    end
  end
end

Is there a way to apply a patch while generating the configs? Maybe that is causing issues

Controlplane nodes/cp1.yaml

# talm: nodes=["192.168.121.54"], endpoints=["192.168.121.54"], templates=["templates/controlplane.yaml"]
# THIS FILE IS AUTOGENERATED. PREFER TEMPLATE EDITS OVER MANUAL ONES.
machine:
  type: controlplane
  kubelet:
    nodeIP:
      validSubnets:
        - 192.168.100.0/24
  network:
    hostname: talos-18f96
    # -- Discovered interfaces:
    # ens6:
    #   hardwareAddr:52:54:00:88:74:39
    #   busPath: 0000:00:06.0
    #   driver: virtio_net
    #   vendor: Red Hat, Inc.
    #   product: Virtio network device)
    interfaces:
      - interface: ens6
        addresses:
          - 192.168.121.54/24
        routes:
          - network: 0.0.0.0/0
            gateway: 192.168.121.1
    nameservers:
      - 192.168.121.1
...

And my patch is

machine:
  network:
    interfaces:
      - deviceSelector:
          physical: true # should select any hardware network device, if you have just one, it will be selected
        dhcp: true
        vip:
          ip: 192.168.121.100

Nikola-Milovic avatar May 29 '25 10:05 Nikola-Milovic

Hi, I think you need to properly edit the values.yaml that is generated after talm init

It looks like a templating issue with default network (from default values.yaml) that does not align with your nodes ip

lb0o avatar Jul 23 '25 13:07 lb0o