terraform-provider-hcloud
terraform-provider-hcloud copied to clipboard
[Bug]: hcloud/setRescue: hcclient/WaitForActions: action _ failed: Unknown Error (unknown_error)
What happened?
Sometimes Hetzner cloud servers fail to start.
If you run hcloud server list
, you see the just created server is off.
see https://github.com/kube-hetzner/kube-hetzner/issues/49
What did you expect to happen?
The server goes up normally
Please provide a minimal working example
resource "hcloud_server" "first_control_plane" {
name = "k3s-control-plane-0"
image = "ubuntu-20.04"
rescue = "linux64"
server_type = "cpx11"
location = "eu-central"
ssh_keys = [hcloud_ssh_key.k3s.id]
firewall_ids = [hcloud_firewall.k3s.id]
placement_group_id = hcloud_placement_group.k3s.id
connection {
user = "root"
private_key = local.ssh_private_key
agent_identity = local.ssh_identity
host = self.ipv4_address
}
provisioner "file" {
content = templatefile("${path.module}/templates/config.ign.tpl", {
name = self.name
ssh_public_key = local.ssh_public_key
})
destination = "/root/config.ign"
}
provisioner "remote-exec" {
inline = local.microOS_install_commands
}
...
}
Happens to me too many times!
I believe it fails to start sometimes when the rescue mode is requested.
This has become a huge problem for us at https://github.com/kube-hetzner/kube-hetzner, in almost 50% of deploys it happens because we use the rescue mode to install a third-party OS on many nodes at once.
@LKaemmerling please do something about it, and any logs we can provide you, just tell us how to get them for you. Thanks!
Hey @mysticaltech,
would it be possible that you give me server ids where this happened?
@LKaemmerling Here you go, this one just happened: 18212271
I will leave it in my project for you and your team to investigate more. Thanks! 🙏
I just tried to provision a cluster with five nodes, and one remained off (id: 18212924)
Hey,
we just released v1.33.1 which contains an improvement for the situation. Can you please test it? It will be available in the next couple of minutes in the Terraform Registry.
@LKaemmerling Thank you so much. However I have tested, and on the second try, one server, #18275324 stayed off, as before. Here are the details. Will not delete, so you can have a look if you want.
The initiating code is https://github.com/kube-hetzner/kube-hetzner
terraform --version
Terraform v1.1.6
on linux_amd64
+ provider registry.terraform.io/hashicorp/local v2.1.0
+ provider registry.terraform.io/hashicorp/null v3.1.0
+ provider registry.terraform.io/hashicorp/random v3.1.0
+ provider registry.terraform.io/hetznercloud/hcloud v1.33.1
+ provider registry.terraform.io/integrations/github v4.20.0
+ provider registry.terraform.io/tenstad/remote v0.0.23
hcloud server list
after 5 minutes in:
ID NAME STATUS IPV4 IPV6 DATACENTER
18275322 k3s-control-plane-2 running 78.46.194.108 2a01:4f8:c010:a0b1::/64 fsn1-dc14
18275323 k3s-agent-1 running 78.47.82.48 2a01:4f8:1c17:c7ac::/64 fsn1-dc14
18275324 k3s-agent-0 off 49.12.10.178 2a01:4f8:c17:8b1a::/64 fsn1-dc14
18275325 k3s-control-plane-0 running 116.202.98.33 2a01:4f8:1c17:f936::/64 fsn1-dc14
18275326 k3s-control-plane-1 running 142.132.188.100 2a01:4f8:c010:5d7f::/64 fsn1-dc14
Hey @mysticaltech,
your server is online ;) Just the Server status is not correct. I passed it to the specific teams. Thanks!
In that case, fantastic! Thank you so much... :) 🙏
I am going to close this issue has the problem has been resolved. If this still occurs feel free to reopen the issue.