Flatcar
Flatcar copied to clipboard
Beta 3185.1.0: ignition fails to create partition on second disk (vmware)
Description
With Beta 3185.1.0 and ignition v3 we observe issues when vSphere VM has more than one disk.
Impact
Cannot deploy VM.
Environment and steps to reproduce
- Set-up: Flatcar VM deployed in vSphere 7 using terraform-provider-vsphere v2.0.2
- Task: Deploy Flatcar Beta 3185.1.0 OVA using Ignition v3 spec file (as vapp)
-
Error: Ignition fails with:
create partitions failed: Failed to pretend to create partitions: exit status 4. Stderr: Could not create partition 1 from 4194304 to 20975714303
. Sometimes ignition fails without an error message. In both cases entering Emergency shell is not possible (reboot loop).
Expected behavior
VM is deployed as it is the case with Flatcar Stable 3139.2.0 OVA with Ignition v2 spec file
Additional information
To narrow it down to Beta release, same ignition json is used (just few lines edited that differ between v2 and v3 spec file). Attaching both files to the issue.
VM config to reproduce:
provider "vsphere" {
user = "user"
password = var.password
vsphere_server = "vc-server-url"
persist_session = true
client_debug = true
}
data "vsphere_datacenter" "dc" {
name = "DC"
}
data "vsphere_datastore_cluster" "datastore" {
name = "datastore"
datacenter_id = "${data.vsphere_datacenter.dc.id}"
}
data "vsphere_compute_cluster" "cluster" {
name = "cluster"
datacenter_id = "${data.vsphere_datacenter.dc.id}"
}
data "vsphere_virtual_machine" "template" {
name = "flatcar_production_vmware_beta"
datacenter_id = "${data.vsphere_datacenter.dc.id}"
}
data "vsphere_network" "network" {
name = "network"
datacenter_id = "${data.vsphere_datacenter.dc.id}"
}
data "local_file" "ignitions" {
filename = "ignition.json"
}
resource "vsphere_virtual_machine" "vm" {
name = "beta-ignition-v3"
resource_pool_id = "${data.vsphere_compute_cluster.cluster.resource_pool_id}"
datastore_cluster_id = "${data.vsphere_datastore_cluster.datastore.id}"
num_cpus = 2
memory = 1024
guest_id = "${data.vsphere_virtual_machine.template.guest_id}"
scsi_type = "${data.vsphere_virtual_machine.template.scsi_type}"
network_interface {
network_id = "${data.vsphere_network.network.id}"
adapter_type = "${data.vsphere_virtual_machine.template.network_interface_types[0]}"
}
disk {
label = "disk0"
size = "64"
unit_number = "0"
eagerly_scrub = false
thin_provisioned = true
}
disk {
label = "disk1"
size = "64"
unit_number = "1"
eagerly_scrub = false
thin_provisioned = true
}
clone {
template_uuid = "${data.vsphere_virtual_machine.template.id}"
}
vapp {
properties = {
"guestinfo.ignition.config.data" = base64gzip(data.local_file.ignitions.content)
"guestinfo.ignition.config.data.encoding" = "gz+base64"
}
}
}
Ignition file for Flatcar Beta 3185.1.0 (failing) ignition-v3-example.json.txt
Ignition file for Flatcar Stable 3139.2.0 (working) ignition-v2-example.json.txt
Hi @pothos, I have stumbled across your PR https://github.com/coreos/ignition/pull/1319 which is not merged yet and is planned for coreos/ignition release 2.14.0. I was wondering if this could be related. Although I am not sure if Flatcar Beta 3185.1.0 (ignition 2.13.0) is already using the updated code.
What's the value of data.vsphere_virtual_machine.template.scsi_type? Can you paste the yaml you use to create the ignition json (both for v2 and v3)?
Ignition v3 file (sorry have to add .txt to upload) ignition.tf.txt Using this provider to create v3 spec file https://github.com/community-terraform-providers/terraform-provider-ignition
To avoid messing with v2, I just edit v3 file to make it to v2.
And for scsi_type:
output "template" {
value = data.vsphere_virtual_machine.template.scsi_type
}
Outputs:
template = pvscsi
I missed to provide output of device paths when VM comes up (with disk attached but without ignition_disk
part).
# ls -la /dev/disk/by-path
total 0
drwxr-xr-x. 2 root root 220 May 5 14:40 .
drwxr-xr-x. 9 root root 180 May 5 14:39 ..
lrwxrwxrwx. 1 root root 9 May 5 14:40 pci-0000:03:00.0-scsi-0:0:0:0 -> ../../sda
lrwxrwxrwx. 1 root root 10 May 5 14:40 pci-0000:03:00.0-scsi-0:0:0:0-part1 -> ../../sda1
lrwxrwxrwx. 1 root root 10 May 5 14:40 pci-0000:03:00.0-scsi-0:0:0:0-part2 -> ../../sda2
lrwxrwxrwx. 1 root root 10 May 5 14:40 pci-0000:03:00.0-scsi-0:0:0:0-part3 -> ../../sda3
lrwxrwxrwx. 1 root root 10 May 5 14:40 pci-0000:03:00.0-scsi-0:0:0:0-part4 -> ../../sda4
lrwxrwxrwx. 1 root root 10 May 5 14:40 pci-0000:03:00.0-scsi-0:0:0:0-part6 -> ../../sda6
lrwxrwxrwx. 1 root root 10 May 5 14:40 pci-0000:03:00.0-scsi-0:0:0:0-part7 -> ../../sda7
lrwxrwxrwx. 1 root root 10 May 5 14:40 pci-0000:03:00.0-scsi-0:0:0:0-part9 -> ../../sda9
lrwxrwxrwx. 1 root root 9 May 5 14:39 pci-0000:03:00.0-scsi-0:0:1:0 -> ../../sdb
Hope this helps.
Hi @pothos, I have stumbled across your PR coreos/ignition#1319 which is not merged yet and is planned for coreos/ignition release 2.14.0. I was wondering if this could be related. Although I am not sure if Flatcar Beta 3185.1.0 (ignition 2.13.0) is already using the updated code.
The fix is already part of our Flatcar release.
Can you try the same v2 config on 3185.1.0? It will be translated to v3 on the fly and I wonder it could make a difference.
Hi @pothos, I have stumbled across your PR coreos/ignition#1319 which is not merged yet and is planned for coreos/ignition release 2.14.0. I was wondering if this could be related. Although I am not sure if Flatcar Beta 3185.1.0 (ignition 2.13.0) is already using the updated code.
The fix is already part of our Flatcar release.
Can you try the same v2 config on 3185.1.0? It will be translated to v3 on the fly and I wonder it could make a difference.
Thanks for confirming. I have tried with same v2 config json on 3185.1.0 - getting the same error.
Just confirmed that same is happening with latest beta 3227.1.0.
Hi @defo89, looked into this: Right now the ignition conversion does not handle ignition version 2.1.0, that's why ignition-v2.json is failing on newer Flatcar's. You can make it work by manually editing it in the following way:
--- a/ignition-v2-example.json.txt
+++ b/ignition-v2-example.json.txt
@@ -2,7 +2,7 @@
"ignition": {
"config": {},
"timeouts": {},
- "version": "2.1.0"
+ "version": "2.3.0"
},
"passwd": {
"users": [
@@ -15,13 +15,13 @@
"storage": {
"disks": [
{
- "device": "/dev/disk/by-path/pci-0000:00:07.0",
+ "device": "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0",
"partitions": [
{
"label": "etc-test",
"number": 1,
- "size": 10240000,
- "start": 2048,
+ "sizeMiB": 5120000,
+ "startMiB": 1024,
"typeGuid": ""
}
]
The older "size" and "start" properties are expressed in sectors, which is mostly 512 bytes.
As to ignition-v3.json not working: are you sure your disk is 10TB in size? ~~It is also possible that things are failing because the disks are getting reordered (/dev/sda swapped with /dev/sdb). Things might be better if you attach the disk to a separate scsi controller instead of having both disks under the same one.~~ You're already using stable device paths so nevermind. If the v2 json file works after runtime conversion by ignition, then v3.json should also work (it does in my testing).
Thanks for looking at this @jepio. For now I worked this around by switching to a single vsphere disk for the affected VMs.
~On the related note, is there an ETA for bringing ignition-v3 to stable release (in other words, when >=3185.0.0 will become stable)?~ nvm, it's now in stable
Seeing this quite often when updating and replacing Flatcar with an attached durable disk:
Ignition finished successfully
Ignition 2.15.0
Stage: kargs
no configs at "/usr/lib/ignition/base.d"
no config dir at "/usr/lib/ignition/base.platform.d/azure"
kargs: kargs passed
Ignition finished successfully
Ignition 2.15.0
Stage: disks
no configs at "/usr/lib/ignition/base.d"
no config dir at "/usr/lib/ignition/base.platform.d/azure"
disks: createPartitions: op(1): [started] waiting for devices [/dev/disk/azure/scsi1/lun1]
disks: createPartitions: op(1): [finished] waiting for devices [/dev/disk/azure/scsi1/lun1]
disks: createPartitions: created device alias for "/dev/disk/azure/scsi1/lun1": "/run/ignition/dev_aliases/dev/disk/azure/scsi1/lun1" -> "/dev/sda"
disks: createPartitions: op(2): [started] partitioning "/run/ignition/dev_aliases/dev/disk/azure/scsi1/lun1"
disks: createPartitions: op(2): op(3): [started] reading partition table of "/run/ignition/dev_aliases/dev/disk/azure/scsi1/lun1"
disks: createPartitions: op(2): op(3): [finished] reading partition table of "/run/ignition/dev_aliases/dev/disk/azure/scsi1/lun1"
disks: createPartitions: op(2): running sgdisk with options: [--pretend --new=0:0:+0 /run/ignition/dev_aliases/dev/disk/azure/scsi1/lun1]
disks: createPartitions: op(2): [failed] partitioning "/run/ignition/dev_aliases/dev/disk/azure/scsi1/lun1": Failed to pretend to create partitions. Err: exit status 4. Stderr: Could not create partition 3 from 0 to 33
Error encountered; not saving changes.
disks failed
Full config:
{
"ignition": {
"config": {
"replace": {
"verification": {}
}
},
"proxy": {},
"security": {
"tls": {}
},
"timeouts": {},
"version": "3.5.0-experimental"
},...
Flatcar version: 3815.2.0 Butane version: 0.19.0
Only deleting the disk brings me forward when this happens. It does not happen all the time though...
This is the disk setup I am using in the butane template:
variant: flatcar
version: 1.0.0
storage:
disks:
- device: /dev/disk/azure/scsi1/lun1
partitions:
- label: portal
filesystems:
- device: /dev/disk/by-partlabel/portal
format: ext4
wipe_filesystem: true
label: portal
Isn't that a different issue, related to terraform: https://github.com/flatcar/flatcar-website/pull/296 ?
Isn't that a different issue, related to terraform
No, this is not related. This is a problem with an already existing disk when recreating the flatcar VM.
So there is some race involved and it doesn't always happen? The same error message was reported in https://github.com/coreos/bugs/issues/2100#issuecomment-499003464
Edit: answer from there says the same as Jeremi below
@TimoKramer:
your partition is missing an explicit number: 1
. you're falling into this behavior:
partitions (list of objects): the list of partitions and their configuration for this particular disk. Every partition must have a unique number, or if 0 is specified, a unique label. number (integer): the partition number, which dictates its position in the partition table (one-indexed). If zero, use the next available partition slot.
so I understand that you would expect the match to happen on the label
field, but ignition tries to create a new partition on every rerun. After the first provisioning the disk has no more free space.