lxd icon indicating copy to clipboard operation
lxd copied to clipboard

VM live migration never finishes for non-sharded storage between the nodes

Open gabrielmougard opened this issue 1 year ago • 20 comments

Required information

  • Distribution: Ubuntu
  • Distribution version: 22.04
  • The output of "lxc info" or if that fails:
    • Kernel version: 6.5.0-14-generic
    • LXD version: 5.19-8635f82 (both for source and target machines)
    • Storage backend in use: Dir

Issue description

I'm trying to do a live migration between machine A and machine B of a VM. There is no shared storage between the VMs.

I created a profile called stateful-vm that I applied to my source VMs:

config:
  migration.stateful: "true"
description: Default LXD profile
devices:
  eth0:
    name: eth0
    network: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    size.state: 4GiB
    type: disk
name: stateful-vm
used_by:
- /1.0/instances/v1
- /1.0/instances/v2
- /1.0/instances/v3

I copied this profile to the target machine lxc profile copy stateful-vm legion-laptop:stateful-vm so that the VM can be instantiated on the other side with the profile.

The size.state parameter is 4GiB which I think should be enough because the default VM memory is 1GiB and I don't run any IOPS intensive work inside of it that would saturate a .qcow2 file to register the live-writes during the migration.

Then, when attempting the live migration, the beginning seems fine (I see the bytes being transferred) but it'll never stop. Or to be more precise, the migration stops around 10GiB being transferred and output the following error message: Error: Failed instance creation: Error transferring instance data: Failed migration on target: Failed restoring checkpoint from source: Monitor is disconnected

Steps to reproduce

  1. Create the stateful-vm profile as shown above and create a VM with it lxc launch ubuntu:jammy v1 --vm --profile stateful-vm
  2. Send the profileto the target : lxc profile copy stateful-vm <TARGET>:stateful-vm
  3. Start the migration: lxc move v1 <TARGET>:v1 and witness the transfer size going much above the VM size to eventually fail.

Information to attach

Here is the 'server log from the source' during the live migration: source.log

And here is the 'server log from the target' of the target: target.log

gabrielmougard avatar Jan 23 '24 13:01 gabrielmougard

Then, when attempting the live migration, the beginning seems fine (I see the bytes being transferred) but it'll never stop. Or to be more precise, the migration stops around 10GiB being transferred and output the following error message: Error: Failed instance creation: Error transferring instance data: Failed migration on target: Failed restoring checkpoint from source: Monitor is disconnected

To be clear, does it "never stop" - i.e hanging or does it stop with an error? Or did you kill it after a while?

tomponline avatar Jan 23 '24 13:01 tomponline

The transfer happens up to a certain point but with an abnormal amount of data being transferred (around 5x the size of the VM) but then LXD stops on its own returning the mentionned error (I see the error happening around 10GiB of transferred data in my case for a fresh ubuntu:jammy VM).

gabrielmougard avatar Jan 23 '24 13:01 gabrielmougard

ah OK that wasn't clear, although size.state isn't related to memory size when doing a live migration. Its more to do with disk I/O but as you state that was low.

Are you able to investigate this as you're working on live migration tech currently? Thanks

tomponline avatar Jan 23 '24 14:01 tomponline

Yeah I can look into it.

gabrielmougard avatar Jan 23 '24 14:01 gabrielmougard

One thing to note is that ZFS will block when the volume is full so it sounds like the volume may be being filled. Have you tried setting a large size.state to confirm whether it succeeds with a larger size.state?

tomponline avatar Jan 23 '24 14:01 tomponline

I first tried with 2GiB, then 4GiB. Let's try with 10GiB to be sure. If it still fails, I'd say it's probably unrelated to that.

gabrielmougard avatar Jan 23 '24 14:01 gabrielmougard

Are you monitoring the fill state of the config volume?

tomponline avatar Jan 23 '24 14:01 tomponline

No, but I will surely inspect it now.

gabrielmougard avatar Jan 23 '24 14:01 gabrielmougard

There might be an issue with QEMU.. I see an ErrMonitorDisconnect happening.

gabrielmougard avatar Jan 23 '24 14:01 gabrielmougard

yes sounds like a qemu crash

tomponline avatar Jan 23 '24 14:01 tomponline

On the receiver side, I see a:

CreateInstanceFromMigration finished args="{IndexHeaderVersion:1 Name:v1 Description: Config:map[] Snapshots:[] MigrationType:{FSType:BLOCK_AND_RSYNC Features:[]} TrackProgress:true Refresh:false Live:false VolumeSize:10737418240 ContentType: VolumeOnly:false ClusterMoveSourceName:}" driver=dir instance=v1 pool=default project=default

I see that the Live parameter is false. Is it normal here?

gabrielmougard avatar Jan 23 '24 14:01 gabrielmougard

Also, interestingly, I tried with a different root size and a different state.size like the following:

lxc launch ubuntu:22.04 vtest --vm -d root,size=20GiB -d root,size.state=10GiB -c limits.memory=1GiB -c limits.cpu=2 -c migration.stateful=true

lxc move vtest <TARGET>:vtest

And this time, the error happens after around 21GB being transferred with the same error message. I don't think size.state is the problem here.

gabrielmougard avatar Jan 23 '24 14:01 gabrielmougard

Also, I monitored my host IOPS using watch iostat and there is no delta between (maybe 2 or 3 kB/s but it fluctuate so it's not that) the kB_wrtn/s before and during the live migration.

gabrielmougard avatar Jan 23 '24 14:01 gabrielmougard

I also tried to do it "locally" with two VM acting as machines and the same happened (even with LXD 5.0.2-838e1b2 inside my VMs) so I don't think my network is the issue.

gabrielmougard avatar Jan 23 '24 15:01 gabrielmougard

@gabrielmougard do the source and target machines hosting the VM have identical CPUs?

tomponline avatar Jan 23 '24 15:01 tomponline

They have the same arch but they are not 'exactly' the same:

source:

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         43 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  24
  On-line CPU(s) list:   0-23
Vendor ID:               AuthenticAMD
  Model name:            AMD Ryzen Threadripper 2920X 12-Core Processor
    CPU family:          23
    Model:               8
    Thread(s) per core:  2
    Core(s) per socket:  12
    Socket(s):           1
    Stepping:            2
    Frequency boost:     enabled
    CPU max MHz:         3500.0000
    CPU min MHz:         2200.0000
    BogoMIPS:            6985.84
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc re
                         p_good nopl nonstop_tsc cpuid extd_apicid amd_dcm aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_
                         lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_psta
                         te ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_
                         save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca sev sev_es
Virtualization features: 
  Virtualization:        AMD-V
Caches (sum of all):     
  L1d:                   384 KiB (12 instances)
  L1i:                   768 KiB (12 instances)
  L2:                    6 MiB (12 instances)
  L3:                    32 MiB (4 instances)
NUMA:                    
  NUMA node(s):          2
  NUMA node0 CPU(s):     0-5,12-17
  NUMA node1 CPU(s):     6-11,18-23
Vulnerabilities:         
  Gather data sampling:  Not affected
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Retbleed:              Mitigation; untrained return thunk; SMT vulnerable
  Spec rstack overflow:  Mitigation; safe RET
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected
  Srbds:                 Not affected
  Tsx async abort:       Not affected

target:

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         48 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  16
  On-line CPU(s) list:   0-15
Vendor ID:               AuthenticAMD
  Model name:            AMD Ryzen 7 6800H with Radeon Graphics
    CPU family:          25
    Model:               68
    Thread(s) per core:  2
    Core(s) per socket:  8
    Socket(s):           1
    Stepping:            1
    CPU max MHz:         4785.0000
    CPU min MHz:         400.0000
    BogoMIPS:            6387.53
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_
                         tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm 
                         sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbas
                         e bmi1 avx2 smep bmi2 invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveer
                         ptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke 
                         vaes vpclmulqdq rdpid overflow_recov succor smca
Virtualization features: 
  Virtualization:        AMD-V
Caches (sum of all):     
  L1d:                   256 KiB (8 instances)
  L1i:                   256 KiB (8 instances)
  L2:                    4 MiB (8 instances)
  L3:                    16 MiB (1 instance)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-15
Vulnerabilities:         
  Gather data sampling:  Not affected
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Retbleed:              Not affected
  Spec rstack overflow:  Mitigation; safe RET, no microcode
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization

gabrielmougard avatar Jan 23 '24 15:01 gabrielmougard

Checking migrating a VM inside a VMs running on the same host to see if its CPU/BIOS related.

tomponline avatar Jan 23 '24 15:01 tomponline

Update: I also checked with a ZFS storage driver between my two physical machines and i see the same error happening (Error: Failed instance creation: Error transferring instance data: Failed migration on target: Failed restoring checkpoint from source: Monitor is disconnected) but this time I don't see the data transfer status bar in the CLI. However, the migration works when the VM is stopped before the migration occurs.

gabrielmougard avatar Jan 29 '24 15:01 gabrielmougard

Just tried it here between two VMs running 5.19 and edge:

snap list
Name        Version                 Rev    Tracking       Publisher   Notes
core20      20231123                2105   latest/stable  canonical✓  base
core22      20231123                1033   latest/stable  canonical✓  base
lxd         git-3e78c61             26765  latest/edge    canonical✓  -
microceph   0+git.4a608fc           793    quincy/stable  canonical✓  -
microcloud  1.1-04a1c49             734    latest/stable  canonical✓  -
microovn    22.03.3+snap1d18f95c73  349    22.03/stable   canonical✓  -
snapd       2.61.1                  20671  latest/stable  canonical✓  snapd

And it worked fine for both ZFS and Ceph.

tomponline avatar Jan 29 '24 16:01 tomponline

@mseralessandri are you able to help diagnose this issue and confirm if its a general issue or something specific to @gabrielmougard setup, thanks!

tomponline avatar Feb 21 '24 13:02 tomponline

I could not reproduce the issue. I tried with the edge and with 5.19. I did a fresh standard installation of two LXD nodes with ZFS as local storage and configured the trust between the client and the server:

#On both nodes
lxd init
Would you like to use LXD clustering? (yes/no) [default=no]: 
Do you want to configure a new storage pool? (yes/no) [default=yes]: 
Name of the new storage pool [default=default]: 
Name of the storage backend to use (zfs, btrfs, ceph, dir, lvm, powerflex) [default=zfs]: 
Create a new ZFS pool? (yes/no) [default=yes]: 
Would you like to use an existing empty block device (e.g. a disk or partition)? (yes/no) [default=no]: 
Size in GiB of the new loop device (1GiB minimum) [default=5GiB]: 
Would you like to connect to a MAAS server? (yes/no) [default=no]: 
Would you like to create a new local network bridge? (yes/no) [default=yes]: 
What should the new bridge be called? [default=lxdbr0]: 
What IPv4 address should be used? (CIDR subnet notation, “auto” or “none”) [default=auto]: 
What IPv6 address should be used? (CIDR subnet notation, “auto” or “none”) [default=auto]: 
Would you like the LXD server to be available over the network? (yes/no) [default=no]: yes
Address to bind LXD to (not including port) [default=all]: 
Port to bind LXD to [default=8443]: 
Would you like stale cached images to be updated automatically? (yes/no) [default=yes]: 
Would you like a YAML "lxd init" preseed to be printed? (yes/no) [default=no]: 

#On the server (maria02)
lxc config trust add 
To start your first container, try: lxc launch ubuntu:22.04
Or for a virtual machine: lxc launch ubuntu:22.04 --vm

Please provide client name: maria01
Client maria01 certificate add token: <token>


#On the client (maria01)
lxc remote add maria02 https://<ip>:8443
Certificate fingerprint: <fingerprint>
ok (y/n/[fingerprint])? y
Admin password (or token) for maria02: 
Client certificate now trusted by server: maria02

lxc profile create stateful-vm
lxc profile edit stateful-vm 
lxc profile copy stateful-vm maria02:stateful-vm
lxc launch ubuntu:jammy v1 --vm --profile stateful-vm
lxc move v1 maria02:v1

mseralessandri avatar Mar 19 '24 15:03 mseralessandri

@mseralessandri I also retried on my side with latest/edge. The problem seems to have disappeared. Maybe I ran into some slow network / machine issue at that time. I'm closing this as I can't reproduce this either. Good job for managing to reproduce the scenario :+1:

gabrielmougard avatar Mar 19 '24 19:03 gabrielmougard