terraform-provider-hcloud icon indicating copy to clipboard operation
terraform-provider-hcloud copied to clipboard

[Bug]: Sometime a new instance starts without the private network interface

Open mnencia opened this issue 2 years ago • 4 comments

What happened?

Sometimes, a new instance starts without the private network interface. There is no trace of that interface in the kernel log. The private network appears correctly associated to the instance both in the web console and in hcloud server describe outpu.

k3s-control-plane-0:~ # ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 96:00:01:1d:47:f0 brd ff:ff:ff:ff:ff:ff
    altname enp1s0

server details

ID:		18275840
Name:		k3s-control-plane-0
Status:		running
Created:	Fri Feb 25 13:09:59 CET 2022 (19 minutes ago)
Server Type:	cpx21 (ID: 23)
  ID:		23
  Name:		cpx21
  Description:	CPX 21
  Cores:	3
  CPU Type:	shared
  Memory:	4 GB
  Disk:		80 GB
  Storage Type:	local
Public Net:
  IPv4:
    IP:		142.132.227.51
    Blocked:	no
    DNS:	static.51.227.132.142.clients.your-server.de
  IPv6:
    IP:		2a01:4f8:1c17:ee1c::/64
    Blocked:	no
  Floating IPs:
    No Floating IPs
Private Net:
  - ID:			1405975
    Name:		k3s
    IP:			10.0.1.1
    MAC Address:	86:00:00:05:94:f2
    Alias IPs:		-
Volumes:
  No Volumes
Image:
  ID:		15512617
  Type:		system
  Status:	available
  Name:		ubuntu-20.04
  Description:	Ubuntu 20.04
  Image size:	-
  Disk size:	5 GB
  Created:	Thu Apr 23 19:55:14 CEST 2020 (2 years ago)
  OS flavor:	ubuntu
  OS version:	20.04
  Rapid deploy:	yes
Datacenter:
  ID:		4
  Name:		fsn1-dc14
  Description:	Falkenstein 1 DC14
  Location:
    Name:		fsn1
    Description:	Falkenstein DC Park 1
    Country:		DE
    City:		Falkenstein
    Latitude:		50.476120
    Longitude:		12.370071
Traffic:
  Outgoing:	0 B
  Ingoing:	0 B
  Included:	20 TiB
Backup Window:	Backups disabled
Rescue System:	disabled
ISO:
  No ISO attached
Protection:
  Delete:	no
  Rebuild:	no
Labels:
  engine: k3s
  provisioner: terraform
Placement Group:
  ID:		24255
  Name:		k3s
  Type:		spread

The server has been created using https://github.com/kube-hetzner/kube-hetzner master branch.

What did you expect to happen?

The server see has all the interfaces it should be attached.

Please provide a minimal working example

resource "hcloud_server" "first_control_plane" {
  name = "k3s-control-plane-0"

  image              = "ubuntu-20.04"
  rescue             = "linux64"
  server_type        = "cpx21"
  location           = "eu-central"
  ssh_keys           = [hcloud_ssh_key.k3s.id]
  firewall_ids       = [hcloud_firewall.k3s.id]
  placement_group_id = hcloud_placement_group.k3s.id

  network {
    network_id = var.network_id
    ip         = var.ip
  }

 ...
}

mnencia avatar Feb 25 '22 12:02 mnencia

Yes, happens to me too.

mysticaltech avatar Feb 25 '22 12:02 mysticaltech

I noticed that the interface has been added after 15 minutes:

This is an excerpt from the dmesg:

[Fri Feb 25 12:11:41 2022] No iBFT detected.
[Fri Feb 25 12:12:21 2022] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[Fri Feb 25 12:12:21 2022] Bridge firewalling registered
[Fri Feb 25 12:21:13 2022] BTRFS info (device sda3): qgroup scan completed (inconsistency flag cleared)
[Fri Feb 25 12:24:29 2022] bpfilter: Loaded bpfilter_umh pid 4919
[Fri Feb 25 12:24:29 2022] Started bpfilter
[Fri Feb 25 12:39:28 2022] pcieport 0000:00:02.6: pciehp: Slot(0-6): Attention button pressed
[Fri Feb 25 12:39:28 2022] pcieport 0000:00:02.6: pciehp: Slot(0-6) Powering on due to button press
[Fri Feb 25 12:39:28 2022] pcieport 0000:00:02.6: pciehp: Slot(0-6): Card present
[Fri Feb 25 12:39:28 2022] pcieport 0000:00:02.6: pciehp: Slot(0-6): Link Up
[Fri Feb 25 12:39:28 2022] pci 0000:07:00.0: [1af4:1041] type 00 class 0x020000
[Fri Feb 25 12:39:28 2022] pci 0000:07:00.0: reg 0x14: [mem 0x00000000-0x00000fff]
[Fri Feb 25 12:39:28 2022] pci 0000:07:00.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
[Fri Feb 25 12:39:28 2022] pci 0000:07:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
[Fri Feb 25 12:39:28 2022] pci 0000:07:00.0: BAR 6: assigned [mem 0xfdc00000-0xfdc7ffff pref]
[Fri Feb 25 12:39:28 2022] pci 0000:07:00.0: BAR 4: assigned [mem 0xfc400000-0xfc403fff 64bit pref]
[Fri Feb 25 12:39:28 2022] pci 0000:07:00.0: BAR 1: assigned [mem 0xfdc80000-0xfdc80fff]
[Fri Feb 25 12:39:28 2022] pcieport 0000:00:02.6: PCI bridge to [bus 07]
[Fri Feb 25 12:39:28 2022] pcieport 0000:00:02.6:   bridge window [io  0x7000-0x7fff]
[Fri Feb 25 12:39:28 2022] pcieport 0000:00:02.6:   bridge window [mem 0xfdc00000-0xfddfffff]
[Fri Feb 25 12:39:28 2022] pcieport 0000:00:02.6:   bridge window [mem 0xfc400000-0xfc5fffff 64bit pref]
[Fri Feb 25 12:39:28 2022] virtio-pci 0000:07:00.0: enabling device (0000 -> 0002)
[Fri Feb 25 12:39:29 2022] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
[Fri Feb 25 12:39:41 2022] IPVS: Registered protocols (TCP, UDP, SCTP, AH, ESP)
[Fri Feb 25 12:39:41 2022] IPVS: Connection hash table configured (size=4096, memory=64Kbytes)
[Fri Feb 25 12:39:41 2022] IPVS: ipvs loaded.
[Fri Feb 25 12:39:41 2022] IPVS: [rr] scheduler registered.
[Fri Feb 25 12:39:41 2022] IPVS: [wrr] scheduler registered.
[Fri Feb 25 12:39:41 2022] IPVS: [sh] scheduler registered.

You can see the last messages from the initial boot (timestamp [Fri Feb 25 12:21:13 2022]) and after some time the eth1 interface that is added (timestamp [Fri Feb 25 12:39:28 2022])

mnencia avatar Feb 25 '22 15:02 mnencia

Seems to have disappeared when we started using hcloud_server_network instead of the network {} block! So it is probably present for those using the network block.

mysticaltech avatar Mar 02 '22 09:03 mysticaltech

Seems to have disappeared when we started using hcloud_server_network instead of the network {} block! So it is probably present for those using the network block.

I just encountered it with the hcloud_server_network resource. The plus side of assigning the IP this way, though, is that you can taint the resource and recreate without creating a new server.

spacemule avatar Jul 11 '22 23:07 spacemule

This issue has been marked as stale because it has not had recent activity. The bot will close the issue if no further action occurs.

github-actions[bot] avatar Oct 12 '23 12:10 github-actions[bot]

Sorry for not responding to this.

It would be great to get some debug logs for this, in case you are still able to reproduce it. I believe this is flaky behavior of our backend, which we can confirm with the logs.

Could you set these two environment variables during terraform apply and when it happens send the resulting log file to my email julian.toelle <at> hetzner-cloud.de?

TF_LOG=TRACE TF_LOG_PATH=gh-512.log terraform apply

apricote avatar Oct 13 '23 06:10 apricote

@apricote I personally haven't seen this bug in a long time, but if it shows up again, will let you know here.

mysticaltech avatar Oct 13 '23 10:10 mysticaltech

This issue has been marked as stale because it has not had recent activity. The bot will close the issue if no further action occurs.

github-actions[bot] avatar Jan 11 '24 12:01 github-actions[bot]