Required information

Distribution: Ubuntu
Distribution version: 22.04
The output of "lxc info" or if that fails:
- Kernel version: 6.5.0-14-generic
- LXD version: 5.19-8635f82 (both for source and target machines)
- Storage backend in use: Dir

Issue description

I'm trying to do a live migration between machine A and machine B of a VM. There is no shared storage between the VMs.

I created a profile called stateful-vm that I applied to my source VMs:

config:
  migration.stateful: "true"
description: Default LXD profile
devices:
  eth0:
    name: eth0
    network: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    size.state: 4GiB
    type: disk
name: stateful-vm
used_by:
- /1.0/instances/v1
- /1.0/instances/v2
- /1.0/instances/v3

I copied this profile to the target machine lxc profile copy stateful-vm legion-laptop:stateful-vm so that the VM can be instantiated on the other side with the profile.

The size.state parameter is 4GiB which I think should be enough because the default VM memory is 1GiB and I don't run any IOPS intensive work inside of it that would saturate a .qcow2 file to register the live-writes during the migration.

Then, when attempting the live migration, the beginning seems fine (I see the bytes being transferred) but it'll never stop. Or to be more precise, the migration stops around 10GiB being transferred and output the following error message: Error: Failed instance creation: Error transferring instance data: Failed migration on target: Failed restoring checkpoint from source: Monitor is disconnected

Steps to reproduce

Create the stateful-vm profile as shown above and create a VM with it lxc launch ubuntu:jammy v1 --vm --profile stateful-vm
Send the profileto the target : lxc profile copy stateful-vm <TARGET>:stateful-vm
Start the migration: lxc move v1 <TARGET>:v1 and witness the transfer size going much above the VM size to eventually fail.

Information to attach

Here is the 'server log from the source' during the live migration: source.log

And here is the 'server log from the target' of the target: target.log

Jan 23 '24 13:01 gabrielmougard

Then, when attempting the live migration, the beginning seems fine (I see the bytes being transferred) but it'll never stop. Or to be more precise, the migration stops around 10GiB being transferred and output the following error message: Error: Failed instance creation: Error transferring instance data: Failed migration on target: Failed restoring checkpoint from source: Monitor is disconnected

To be clear, does it "never stop" - i.e hanging or does it stop with an error? Or did you kill it after a while?

Jan 23 '24 13:01 tomponline

The transfer happens up to a certain point but with an abnormal amount of data being transferred (around 5x the size of the VM) but then LXD stops on its own returning the mentionned error (I see the error happening around 10GiB of transferred data in my case for a fresh ubuntu:jammy VM).

Jan 23 '24 13:01 gabrielmougard

ah OK that wasn't clear, although size.state isn't related to memory size when doing a live migration. Its more to do with disk I/O but as you state that was low.

Are you able to investigate this as you're working on live migration tech currently? Thanks

Jan 23 '24 14:01 tomponline

Yeah I can look into it.