Flatcar icon indicating copy to clipboard operation
Flatcar copied to clipboard

Beta 3185.1.0: ignition fails to create partition on second disk (vmware)

Open defo89 opened this issue 2 years ago • 14 comments

Description

With Beta 3185.1.0 and ignition v3 we observe issues when vSphere VM has more than one disk.

Impact

Cannot deploy VM.

Environment and steps to reproduce

  1. Set-up: Flatcar VM deployed in vSphere 7 using terraform-provider-vsphere v2.0.2
  2. Task: Deploy Flatcar Beta 3185.1.0 OVA using Ignition v3 spec file (as vapp)
  3. Error: Ignition fails with: create partitions failed: Failed to pretend to create partitions: exit status 4. Stderr: Could not create partition 1 from 4194304 to 20975714303. Sometimes ignition fails without an error message. In both cases entering Emergency shell is not possible (reboot loop).

ignition-v3-disk-error

Expected behavior

VM is deployed as it is the case with Flatcar Stable 3139.2.0 OVA with Ignition v2 spec file

Additional information

To narrow it down to Beta release, same ignition json is used (just few lines edited that differ between v2 and v3 spec file). Attaching both files to the issue.

VM config to reproduce:

provider "vsphere" {
  user                 = "user"
  password             = var.password
  vsphere_server       = "vc-server-url"
  persist_session      = true
  client_debug         = true
}

data "vsphere_datacenter" "dc" {
  name = "DC"
}

data "vsphere_datastore_cluster" "datastore" {
  name          = "datastore"
  datacenter_id = "${data.vsphere_datacenter.dc.id}"
}

data "vsphere_compute_cluster" "cluster" {
  name          = "cluster"
  datacenter_id = "${data.vsphere_datacenter.dc.id}"
}

data "vsphere_virtual_machine" "template" {
  name          = "flatcar_production_vmware_beta"
  datacenter_id = "${data.vsphere_datacenter.dc.id}"
}

data "vsphere_network" "network" {
  name          = "network"
  datacenter_id = "${data.vsphere_datacenter.dc.id}"
}

data "local_file" "ignitions" {
  filename = "ignition.json"
}

resource "vsphere_virtual_machine" "vm" {
  name             = "beta-ignition-v3"
  resource_pool_id = "${data.vsphere_compute_cluster.cluster.resource_pool_id}"
  datastore_cluster_id = "${data.vsphere_datastore_cluster.datastore.id}"

  num_cpus = 2
  memory   = 1024
  guest_id = "${data.vsphere_virtual_machine.template.guest_id}"
  scsi_type = "${data.vsphere_virtual_machine.template.scsi_type}"

  network_interface {
    network_id   = "${data.vsphere_network.network.id}"
    adapter_type = "${data.vsphere_virtual_machine.template.network_interface_types[0]}"
  }

  disk {
    label            = "disk0"
    size             = "64"
    unit_number      = "0"
    eagerly_scrub    = false
    thin_provisioned = true
  }

  disk {
    label            = "disk1"
    size             = "64"
    unit_number      = "1"
    eagerly_scrub    = false
    thin_provisioned = true
  }

  clone {
    template_uuid = "${data.vsphere_virtual_machine.template.id}"
  }

vapp {
    properties = {
      "guestinfo.ignition.config.data"          = base64gzip(data.local_file.ignitions.content)
      "guestinfo.ignition.config.data.encoding" = "gz+base64"
    }
  }
}

defo89 avatar May 06 '22 08:05 defo89

Ignition file for Flatcar Beta 3185.1.0 (failing) ignition-v3-example.json.txt

Ignition file for Flatcar Stable 3139.2.0 (working) ignition-v2-example.json.txt

defo89 avatar May 06 '22 09:05 defo89

Hi @pothos, I have stumbled across your PR https://github.com/coreos/ignition/pull/1319 which is not merged yet and is planned for coreos/ignition release 2.14.0. I was wondering if this could be related. Although I am not sure if Flatcar Beta 3185.1.0 (ignition 2.13.0) is already using the updated code.

defo89 avatar May 06 '22 09:05 defo89

What's the value of data.vsphere_virtual_machine.template.scsi_type? Can you paste the yaml you use to create the ignition json (both for v2 and v3)?

jepio avatar May 06 '22 09:05 jepio

Ignition v3 file (sorry have to add .txt to upload) ignition.tf.txt Using this provider to create v3 spec file https://github.com/community-terraform-providers/terraform-provider-ignition

To avoid messing with v2, I just edit v3 file to make it to v2.

And for scsi_type:

output "template" {
  value = data.vsphere_virtual_machine.template.scsi_type
}

Outputs:
template = pvscsi

I missed to provide output of device paths when VM comes up (with disk attached but without ignition_disk part).

# ls -la /dev/disk/by-path
total 0
drwxr-xr-x. 2 root root 220 May  5 14:40 .
drwxr-xr-x. 9 root root 180 May  5 14:39 ..
lrwxrwxrwx. 1 root root   9 May  5 14:40 pci-0000:03:00.0-scsi-0:0:0:0 -> ../../sda
lrwxrwxrwx. 1 root root  10 May  5 14:40 pci-0000:03:00.0-scsi-0:0:0:0-part1 -> ../../sda1
lrwxrwxrwx. 1 root root  10 May  5 14:40 pci-0000:03:00.0-scsi-0:0:0:0-part2 -> ../../sda2
lrwxrwxrwx. 1 root root  10 May  5 14:40 pci-0000:03:00.0-scsi-0:0:0:0-part3 -> ../../sda3
lrwxrwxrwx. 1 root root  10 May  5 14:40 pci-0000:03:00.0-scsi-0:0:0:0-part4 -> ../../sda4
lrwxrwxrwx. 1 root root  10 May  5 14:40 pci-0000:03:00.0-scsi-0:0:0:0-part6 -> ../../sda6
lrwxrwxrwx. 1 root root  10 May  5 14:40 pci-0000:03:00.0-scsi-0:0:0:0-part7 -> ../../sda7
lrwxrwxrwx. 1 root root  10 May  5 14:40 pci-0000:03:00.0-scsi-0:0:0:0-part9 -> ../../sda9
lrwxrwxrwx. 1 root root   9 May  5 14:39 pci-0000:03:00.0-scsi-0:0:1:0 -> ../../sdb

Hope this helps.

defo89 avatar May 06 '22 09:05 defo89

Hi @pothos, I have stumbled across your PR coreos/ignition#1319 which is not merged yet and is planned for coreos/ignition release 2.14.0. I was wondering if this could be related. Although I am not sure if Flatcar Beta 3185.1.0 (ignition 2.13.0) is already using the updated code.

The fix is already part of our Flatcar release.

Can you try the same v2 config on 3185.1.0? It will be translated to v3 on the fly and I wonder it could make a difference.

pothos avatar May 06 '22 10:05 pothos

Hi @pothos, I have stumbled across your PR coreos/ignition#1319 which is not merged yet and is planned for coreos/ignition release 2.14.0. I was wondering if this could be related. Although I am not sure if Flatcar Beta 3185.1.0 (ignition 2.13.0) is already using the updated code.

The fix is already part of our Flatcar release.

Can you try the same v2 config on 3185.1.0? It will be translated to v3 on the fly and I wonder it could make a difference.

Thanks for confirming. I have tried with same v2 config json on 3185.1.0 - getting the same error.

defo89 avatar May 06 '22 11:05 defo89

Just confirmed that same is happening with latest beta 3227.1.0.

defo89 avatar Jun 14 '22 12:06 defo89

Hi @defo89, looked into this: Right now the ignition conversion does not handle ignition version 2.1.0, that's why ignition-v2.json is failing on newer Flatcar's. You can make it work by manually editing it in the following way:

--- a/ignition-v2-example.json.txt
+++ b/ignition-v2-example.json.txt
@@ -2,7 +2,7 @@
     "ignition": {
         "config": {},
         "timeouts": {},
-        "version": "2.1.0"
+        "version": "2.3.0"
     },
     "passwd": {
         "users": [
@@ -15,13 +15,13 @@
     "storage": {
         "disks": [
             {
-                "device": "/dev/disk/by-path/pci-0000:00:07.0",
+                "device": "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0",
                 "partitions": [
                     {
                         "label": "etc-test",
                         "number": 1,
-                        "size": 10240000,
-                        "start": 2048,
+                        "sizeMiB": 5120000,
+                        "startMiB": 1024,
                         "typeGuid": ""
                     }
                 ]

The older "size" and "start" properties are expressed in sectors, which is mostly 512 bytes.

As to ignition-v3.json not working: are you sure your disk is 10TB in size? ~~It is also possible that things are failing because the disks are getting reordered (/dev/sda swapped with /dev/sdb). Things might be better if you attach the disk to a separate scsi controller instead of having both disks under the same one.~~ You're already using stable device paths so nevermind. If the v2 json file works after runtime conversion by ignition, then v3.json should also work (it does in my testing).

jepio avatar Jun 14 '22 13:06 jepio

Thanks for looking at this @jepio. For now I worked this around by switching to a single vsphere disk for the affected VMs.

~On the related note, is there an ETA for bringing ignition-v3 to stable release (in other words, when >=3185.0.0 will become stable)?~ nvm, it's now in stable

defo89 avatar Jul 05 '22 10:07 defo89

Seeing this quite often when updating and replacing Flatcar with an attached durable disk:

Ignition finished successfully
Ignition 2.15.0
Stage: kargs
no configs at "/usr/lib/ignition/base.d"
no config dir at "/usr/lib/ignition/base.platform.d/azure"
kargs: kargs passed
Ignition finished successfully
Ignition 2.15.0
Stage: disks
no configs at "/usr/lib/ignition/base.d"
no config dir at "/usr/lib/ignition/base.platform.d/azure"
disks: createPartitions: op(1): [started]  waiting for devices [/dev/disk/azure/scsi1/lun1]
disks: createPartitions: op(1): [finished] waiting for devices [/dev/disk/azure/scsi1/lun1]
disks: createPartitions: created device alias for "/dev/disk/azure/scsi1/lun1": "/run/ignition/dev_aliases/dev/disk/azure/scsi1/lun1" -> "/dev/sda"
disks: createPartitions: op(2): [started]  partitioning "/run/ignition/dev_aliases/dev/disk/azure/scsi1/lun1"
disks: createPartitions: op(2): op(3): [started]  reading partition table of "/run/ignition/dev_aliases/dev/disk/azure/scsi1/lun1"
disks: createPartitions: op(2): op(3): [finished] reading partition table of "/run/ignition/dev_aliases/dev/disk/azure/scsi1/lun1"
disks: createPartitions: op(2): running sgdisk with options: [--pretend --new=0:0:+0 /run/ignition/dev_aliases/dev/disk/azure/scsi1/lun1]
disks: createPartitions: op(2): [failed]   partitioning "/run/ignition/dev_aliases/dev/disk/azure/scsi1/lun1": Failed to pretend to create partitions. Err: exit status 4. Stderr: Could not create partition 3 from 0 to 33
Error encountered; not saving changes.
disks failed
Full config:
{
  "ignition": {
    "config": {
      "replace": {
        "verification": {}
      }
    },
    "proxy": {},
    "security": {
      "tls": {}
    },
    "timeouts": {},
    "version": "3.5.0-experimental"
  },...

Flatcar version: 3815.2.0 Butane version: 0.19.0

Only deleting the disk brings me forward when this happens. It does not happen all the time though...

This is the disk setup I am using in the butane template:

variant: flatcar
version: 1.0.0

storage:
  disks:
    - device: /dev/disk/azure/scsi1/lun1
      partitions:
        - label: portal
  filesystems:
    - device: /dev/disk/by-partlabel/portal
      format: ext4
      wipe_filesystem: true
      label: portal

TimoKramer avatar Mar 21 '24 15:03 TimoKramer

Isn't that a different issue, related to terraform: https://github.com/flatcar/flatcar-website/pull/296 ?

jepio avatar Mar 21 '24 17:03 jepio

Isn't that a different issue, related to terraform

No, this is not related. This is a problem with an already existing disk when recreating the flatcar VM.

TimoKramer avatar Mar 21 '24 20:03 TimoKramer

So there is some race involved and it doesn't always happen? The same error message was reported in https://github.com/coreos/bugs/issues/2100#issuecomment-499003464

Edit: answer from there says the same as Jeremi below

pothos avatar Apr 09 '24 13:04 pothos

@TimoKramer: your partition is missing an explicit number: 1. you're falling into this behavior:

partitions (list of objects): the list of partitions and their configuration for this particular disk. Every partition must have a unique number, or if 0 is specified, a unique label. number (integer): the partition number, which dictates its position in the partition table (one-indexed). If zero, use the next available partition slot.

so I understand that you would expect the match to happen on the label field, but ignition tries to create a new partition on every rerun. After the first provisioning the disk has no more free space.

jepio avatar Apr 09 '24 14:04 jepio