linstor-server icon indicating copy to clipboard operation
linstor-server copied to clipboard

Can't restore VM in Proxmox

Open paprikkafox opened this issue 1 year ago • 2 comments

Environment:

  • 3 HV Nodes with Linstor installed in HA mode (drbd-reactor)
  • Each has 3 disks in pool 'ssd_zpool1' (2 servers with 3 SSD and 1 server with 2 SSD+HDD)
  • 2 servers has 'hdd_zpool1' with 8 TB HDDs and 1 server has same pool with 2 TB HDDs
  • Each pool based on ZFS-Thin volumes
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ StoragePool          ┊ Node       ┊ Driver   ┊ PoolName   ┊ FreeCapacity ┊ TotalCapacity ┊ CanSnapshots ┊ State ┊ SharedName ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ DfltDisklessStorPool ┊ SRVDMPVE01 ┊ DISKLESS ┊            ┊              ┊               ┊ False        ┊ Ok    ┊            ┊
┊ DfltDisklessStorPool ┊ SRVDMPVE02 ┊ DISKLESS ┊            ┊              ┊               ┊ False        ┊ Ok    ┊            ┊
┊ DfltDisklessStorPool ┊ SRVDMPVE03 ┊ DISKLESS ┊            ┊              ┊               ┊ False        ┊ Ok    ┊            ┊
┊ hdd_zpool1           ┊ SRVDMPVE01 ┊ ZFS      ┊ hdd_zpool1 ┊     6.11 TiB ┊      7.27 TiB ┊ True         ┊ Ok    ┊            ┊
┊ hdd_zpool1           ┊ SRVDMPVE02 ┊ ZFS      ┊ hdd_zpool1 ┊     1.76 TiB ┊      1.81 TiB ┊ True         ┊ Ok    ┊            ┊
┊ hdd_zpool1           ┊ SRVDMPVE03 ┊ ZFS      ┊ hdd_zpool1 ┊     6.11 TiB ┊      7.27 TiB ┊ True         ┊ Ok    ┊            ┊
┊ ssd_zpool1           ┊ SRVDMPVE01 ┊ ZFS      ┊ ssd_zpool1 ┊     1.75 TiB ┊      2.72 TiB ┊ True         ┊ Ok    ┊            ┊
┊ ssd_zpool1           ┊ SRVDMPVE02 ┊ ZFS      ┊ ssd_zpool1 ┊     1.74 TiB ┊      2.72 TiB ┊ True         ┊ Ok    ┊            ┊
┊ ssd_zpool1           ┊ SRVDMPVE03 ┊ ZFS      ┊ ssd_zpool1 ┊     1.74 TiB ┊      2.72 TiB ┊ True         ┊ Ok    ┊            ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Software Versions:

  • Proxmox Virtual Environment 7.4-3
  • Linstor stack - 1.18.0; GIT-hash: 9a2f939169b360ed3daa3fa2623dc3baa22cb509

Proxmox Plugin config:

drbd: net-vz-data
        resourcegroup hot_data
        content rootdir,images
        controller MULTIPLE_IPS_OF_CONTROLLERS
        preferlocal true
        statuscache 5

Problem:

When I try to create then restore backup of VM with TPM2.0 and EFI storage enabled im getting error about different disk sizes for EFI disk (used to store EFI vars)

vma: vma_reader_register_bs for stream drive-efidisk0 failed - unexpected size 5242880 != 540672

Full restore log:

restore vma archive: zstd -q -d -c /mnt/pve/net-share-01/dump/vzdump-qemu-101-2023_06_05-11_08_41.vma.zst | vma extract -v -r /var/tmp/vzdumptmp1491351.fifo - /var/tmp/vzdumptmp1491351
CFG: size: 781 name: qemu-server.conf
DEV: dev_id=1 size: 540672 devname: drive-efidisk0
DEV: dev_id=2 size: 8589934592 devname: drive-scsi0
DEV: dev_id=3 size: 5242880 devname: drive-tpmstate0-backup
CTIME: Mon Jun  5 11:08:43 2023
new volume ID is 'net-vz-data:vm-101-disk-1'
new volume ID is 'net-vz-data:vm-101-disk-2'
new volume ID is 'net-vz-data:vm-101-disk-3'
map 'drive-efidisk0' to '/dev/drbd/by-res/vm-101-disk-1/0' (write zeros = 1)
map 'drive-scsi0' to '/dev/drbd/by-res/vm-101-disk-2/0' (write zeros = 1)
map 'drive-tpmstate0-backup' to '/dev/drbd/by-res/vm-101-disk-3/0' (write zeros = 1)
vma: vma_reader_register_bs for stream drive-efidisk0 failed - unexpected size 5242880 != 540672
/bin/bash: line 1: 1491353 Broken pipe             zstd -q -d -c /mnt/pve/net-share-01/dump/vzdump-qemu-101-2023_06_05-11_08_41.vma.zst
     1491354 Trace/breakpoint trap   | vma extract -v -r /var/tmp/vzdumptmp1491351.fifo - /var/tmp/vzdumptmp1491351
temporary volume 'net-vz-data:vm-101-disk-2' sucessfuly removed
temporary volume 'net-vz-data:vm-101-disk-1' sucessfuly removed
temporary volume 'net-vz-data:vm-101-disk-3' sucessfuly removed
no lock found trying to remove 'create'  lock
error before or during data restore, some or all disks were not completely restored. VM 101 state is NOT cleaned up.
TASK ERROR: command 'set -o pipefail && zstd -q -d -c /mnt/pve/net-share-01/dump/vzdump-qemu-101-2023_06_05-11_08_41.vma.zst | vma extract -v -r /var/tmp/vzdumptmp1491351.fifo - /var/tmp/vzdumptmp1491351' failed: exit code 133

I think the problem is somehow tied to the work of ZFS thin-provisoning and related functionality in Linstor, tell me please, maybe I'm doing something wrong

paprikkafox avatar Jun 05 '23 08:06 paprikkafox

Try to put metadata on a separate block device: https://github.com/LINBIT/linstor-server/issues/128

I use something like this:

linstor controller set-property DrbdOptions/auto-quorum disabled
linstor storage-pool create zfs px1 zfs_12 zpool1/proxmox/drbd
linstor storage-pool create zfs px2 zfs_12 zpool1/proxmox/drbd
linstor storage-pool create diskless px3 zfs_12
linstor resource-group create --storage-pool=zfs_12 --place-count=2 zfs_12
linstor volume-group create zfs_12

linstor sp c lvm px1 zfs_12_meta VG1
linstor sp c lvm px2 zfs_12_meta VG1
linstor sp sp px1 zfs_12_meta StorDriver/LvcreateOptions "-m 1 /dev/disk/by-partlabel/LVM_NVME01 /dev/disk/by-partlabel/LVM_NVME02" 
linstor sp sp px2 zfs_12_meta StorDriver/LvcreateOptions "-m 1 /dev/disk/by-partlabel/LVM_NVME01 /dev/disk/by-partlabel/LVM_NVME02" 
linstor rg sp zfs_12 StorPoolNameDrbdMeta zfs_12_meta
linstor rg sp zfs_12 DrbdMetaType external
linstor rg sp zfs_12 StorDriver/ZfscreateOptions "-o volblocksize=16k" 

ggzengel avatar Aug 08 '23 20:08 ggzengel

Try to put metadata on a separate block device: #128

I use something like this:

linstor controller set-property DrbdOptions/auto-quorum disabled
linstor storage-pool create zfs px1 zfs_12 zpool1/proxmox/drbd
linstor storage-pool create zfs px2 zfs_12 zpool1/proxmox/drbd
linstor storage-pool create diskless px3 zfs_12
linstor resource-group create --storage-pool=zfs_12 --place-count=2 zfs_12
linstor volume-group create zfs_12

linstor sp c lvm px1 zfs_12_meta VG1
linstor sp c lvm px2 zfs_12_meta VG1
linstor sp sp px1 zfs_12_meta StorDriver/LvcreateOptions "-m 1 /dev/disk/by-partlabel/LVM_NVME01 /dev/disk/by-partlabel/LVM_NVME02" 
linstor sp sp px2 zfs_12_meta StorDriver/LvcreateOptions "-m 1 /dev/disk/by-partlabel/LVM_NVME01 /dev/disk/by-partlabel/LVM_NVME02" 
linstor rg sp zfs_12 StorPoolNameDrbdMeta zfs_12_meta
linstor rg sp zfs_12 DrbdMetaType external
linstor rg sp zfs_12 StorDriver/ZfscreateOptions "-o volblocksize=16k" 

I tried, but it does't help neither with volblocksize=16k or 32k. Also I should mention that pool ssd_zpool1 is a raidz-1 pool (3 disks) with defalut ashift=12, created from Proxmox Web GUI.

paprikkafox avatar Oct 18 '23 12:10 paprikkafox

I have some ideas why this happened in the first place, and by checking if this is the case, I was not able to reproduce it. My best guess is that PVE is no longer that strict with sizes as long as things fit? I saw these at the end of the restore:

VM 103 (scsi0): size of disk 'ontwodistinct:pm-634e6c6c_103' updated from 1G to 1052408K
VM 103 (efidisk0): size of disk 'ontwodistinct:pm-6fbd0488_103' updated from 8152K to 5M
VM 103 (tpmstate0): size of disk 'ontwodistinct:pm-5bf812b9_103' updated from 4M to 8152K

As this issue is already pretty old and I was not longer able to reproduce it, I'm closing this. If this is still an issue with latest LINSTOR and latest linstor-proxmox plugin, feel free to re-open

rck avatar Apr 22 '24 07:04 rck