terraform-provider-hcloud icon indicating copy to clipboard operation
terraform-provider-hcloud copied to clipboard

[Bug]: hcloud/setRescue: hcclient/WaitForActions: action _ failed: Unknown Error (unknown_error)

Open mnencia opened this issue 2 years ago • 11 comments

What happened?

Sometimes Hetzner cloud servers fail to start.

If you run hcloud server list, you see the just created server is off.

see https://github.com/kube-hetzner/kube-hetzner/issues/49

What did you expect to happen?

The server goes up normally

Please provide a minimal working example

resource "hcloud_server" "first_control_plane" {
  name = "k3s-control-plane-0"

  image              = "ubuntu-20.04"
  rescue             = "linux64"
  server_type        = "cpx11"
  location           = "eu-central"
  ssh_keys           = [hcloud_ssh_key.k3s.id]
  firewall_ids       = [hcloud_firewall.k3s.id]
  placement_group_id = hcloud_placement_group.k3s.id

  connection {
    user           = "root"
    private_key    = local.ssh_private_key
    agent_identity = local.ssh_identity
    host           = self.ipv4_address
  }

  provisioner "file" {
    content = templatefile("${path.module}/templates/config.ign.tpl", {
      name           = self.name
      ssh_public_key = local.ssh_public_key
    })
    destination = "/root/config.ign"
  }

  provisioner "remote-exec" {
    inline = local.microOS_install_commands
  }

 ...
}

mnencia avatar Feb 17 '22 09:02 mnencia

Happens to me too many times!

mysticaltech avatar Feb 17 '22 10:02 mysticaltech

Screenshot 2022-02-17 at 10 02 07

mnencia avatar Feb 17 '22 10:02 mnencia

I believe it fails to start sometimes when the rescue mode is requested.

mysticaltech avatar Feb 17 '22 21:02 mysticaltech

This has become a huge problem for us at https://github.com/kube-hetzner/kube-hetzner, in almost 50% of deploys it happens because we use the rescue mode to install a third-party OS on many nodes at once.

@LKaemmerling please do something about it, and any logs we can provide you, just tell us how to get them for you. Thanks!

mysticaltech avatar Feb 22 '22 20:02 mysticaltech

Hey @mysticaltech,

would it be possible that you give me server ids where this happened?

LKaemmerling avatar Feb 23 '22 05:02 LKaemmerling

@LKaemmerling Here you go, this one just happened: 18212271

I will leave it in my project for you and your team to investigate more. Thanks! 🙏

ksnip_20220223-081649

mysticaltech avatar Feb 23 '22 07:02 mysticaltech

I just tried to provision a cluster with five nodes, and one remained off (id: 18212924) Screenshot 2022-02-23 at 08 53 45

mnencia avatar Feb 23 '22 07:02 mnencia

Hey,

we just released v1.33.1 which contains an improvement for the situation. Can you please test it? It will be available in the next couple of minutes in the Terraform Registry.

LKaemmerling avatar Feb 25 '22 10:02 LKaemmerling

@LKaemmerling Thank you so much. However I have tested, and on the second try, one server, #18275324 stayed off, as before. Here are the details. Will not delete, so you can have a look if you want.

The initiating code is https://github.com/kube-hetzner/kube-hetzner

terraform --version
Terraform v1.1.6
on linux_amd64
+ provider registry.terraform.io/hashicorp/local v2.1.0
+ provider registry.terraform.io/hashicorp/null v3.1.0
+ provider registry.terraform.io/hashicorp/random v3.1.0
+ provider registry.terraform.io/hetznercloud/hcloud v1.33.1
+ provider registry.terraform.io/integrations/github v4.20.0
+ provider registry.terraform.io/tenstad/remote v0.0.23

hcloud server list after 5 minutes in:

ID         NAME                  STATUS    IPV4              IPV6                      DATACENTER
18275322   k3s-control-plane-2   running   78.46.194.108     2a01:4f8:c010:a0b1::/64   fsn1-dc14
18275323   k3s-agent-1           running   78.47.82.48       2a01:4f8:1c17:c7ac::/64   fsn1-dc14
18275324   k3s-agent-0           off       49.12.10.178      2a01:4f8:c17:8b1a::/64    fsn1-dc14
18275325   k3s-control-plane-0   running   116.202.98.33     2a01:4f8:1c17:f936::/64   fsn1-dc14
18275326   k3s-control-plane-1   running   142.132.188.100   2a01:4f8:c010:5d7f::/64   fsn1-dc14

ksnip_20220225-125443

ksnip_20220225-125454

mysticaltech avatar Feb 25 '22 12:02 mysticaltech

Hey @mysticaltech,

your server is online ;) Just the Server status is not correct. I passed it to the specific teams. Thanks!

LKaemmerling avatar Feb 25 '22 12:02 LKaemmerling

In that case, fantastic! Thank you so much... :) 🙏

mysticaltech avatar Feb 25 '22 12:02 mysticaltech

I am going to close this issue has the problem has been resolved. If this still occurs feel free to reopen the issue.

apricote avatar Nov 23 '22 10:11 apricote