incus icon indicating copy to clipboard operation
incus copied to clipboard

Unable to delete operations after failed migration

Open ineu opened this issue 3 months ago • 7 comments

Is there an existing issue for this?

  • [x] There is no existing issue for this bug

Is this happening on an up to date version of Incus?

  • [x] This is happening on a supported version of Incus

Incus system details

config:
  core.https_address: 172.18.0.2:8443
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- network_sriov
- console
- restrict_dev_incus
- migration_pre_copy
- infiniband
- dev_incus_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- dev_incus_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- backup_compression
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- clustering_evacuation
- network_ovn_nat_address
- network_bgp
- network_forward
- custom_volume_refresh
- network_counters_errors_dropped
- metrics
- image_source_project
- clustering_config
- network_peer
- linux_sysctl
- network_dns
- ovn_nic_acceleration
- certificate_self_renewal
- instance_project_move
- storage_volume_project_move
- cloud_init
- network_dns_nat
- database_leader
- instance_all_projects
- clustering_groups
- ceph_rbd_du
- instance_get_full
- qemu_metrics
- gpu_mig_uuid
- event_project
- clustering_evacuation_live
- instance_allow_inconsistent_copy
- network_state_ovn
- storage_volume_api_filtering
- image_restrictions
- storage_zfs_export
- network_dns_records
- storage_zfs_reserve_space
- network_acl_log
- storage_zfs_blocksize
- metrics_cpu_seconds
- instance_snapshot_never
- certificate_token
- instance_nic_routed_neighbor_probe
- event_hub
- agent_nic_config
- projects_restricted_intercept
- metrics_authentication
- images_target_project
- images_all_projects
- cluster_migration_inconsistent_copy
- cluster_ovn_chassis
- container_syscall_intercept_sched_setscheduler
- storage_lvm_thinpool_metadata_size
- storage_volume_state_total
- instance_file_head
- instances_nic_host_name
- image_copy_profile
- container_syscall_intercept_sysinfo
- clustering_evacuation_mode
- resources_pci_vpd
- qemu_raw_conf
- storage_cephfs_fscache
- network_load_balancer
- vsock_api
- instance_ready_state
- network_bgp_holdtime
- storage_volumes_all_projects
- metrics_memory_oom_total
- storage_buckets
- storage_buckets_create_credentials
- metrics_cpu_effective_total
- projects_networks_restricted_access
- storage_buckets_local
- loki
- acme
- internal_metrics
- cluster_join_token_expiry
- remote_token_expiry
- init_preseed
- storage_volumes_created_at
- cpu_hotplug
- projects_networks_zones
- network_txqueuelen
- cluster_member_state
- instances_placement_scriptlet
- storage_pool_source_wipe
- zfs_block_mode
- instance_generation_id
- disk_io_cache
- amd_sev
- storage_pool_loop_resize
- migration_vm_live
- ovn_nic_nesting
- oidc
- network_ovn_l3only
- ovn_nic_acceleration_vdpa
- cluster_healing
- instances_state_total
- auth_user
- security_csm
- instances_rebuild
- numa_cpu_placement
- custom_volume_iso
- network_allocations
- zfs_delegate
- storage_api_remote_volume_snapshot_copy
- operations_get_query_all_projects
- metadata_configuration
- syslog_socket
- event_lifecycle_name_and_project
- instances_nic_limits_priority
- disk_initial_volume_configuration
- operation_wait
- image_restriction_privileged
- cluster_internal_custom_volume_copy
- disk_io_bus
- storage_cephfs_create_missing
- instance_move_config
- ovn_ssl_config
- certificate_description
- disk_io_bus_virtio_blk
- loki_config_instance
- instance_create_start
- clustering_evacuation_stop_options
- boot_host_shutdown_action
- agent_config_drive
- network_state_ovn_lr
- image_template_permissions
- storage_bucket_backup
- storage_lvm_cluster
- shared_custom_block_volumes
- auth_tls_jwt
- oidc_claim
- device_usb_serial
- numa_cpu_balanced
- image_restriction_nesting
- network_integrations
- instance_memory_swap_bytes
- network_bridge_external_create
- network_zones_all_projects
- storage_zfs_vdev
- container_migration_stateful
- profiles_all_projects
- instances_scriptlet_get_instances
- instances_scriptlet_get_cluster_members
- instances_scriptlet_get_project
- network_acl_stateless
- instance_state_started_at
- networks_all_projects
- network_acls_all_projects
- storage_buckets_all_projects
- resources_load
- instance_access
- project_access
- projects_force_delete
- resources_cpu_flags
- disk_io_bus_cache_filesystem
- instance_oci
- clustering_groups_config
- instances_lxcfs_per_instance
- clustering_groups_vm_cpu_definition
- disk_volume_subpath
- projects_limits_disk_pool
- network_ovn_isolated
- qemu_raw_qmp
- network_load_balancer_health_check
- oidc_scopes
- network_integrations_peer_name
- qemu_scriptlet
- instance_auto_restart
- storage_lvm_metadatasize
- ovn_nic_promiscuous
- ovn_nic_ip_address_none
- instances_state_os_info
- network_load_balancer_state
- instance_nic_macvlan_mode
- storage_lvm_cluster_create
- network_ovn_external_interfaces
- instances_scriptlet_get_instances_count
- cluster_rebalance
- custom_volume_refresh_exclude_older_snapshots
- storage_initial_owner
- storage_live_migration
- instance_console_screenshot
- image_import_alias
- authorization_scriptlet
- console_force
- network_ovn_state_addresses
- network_bridge_acl_devices
- instance_debug_memory
- init_preseed_storage_volumes
- init_preseed_profile_project
- instance_nic_routed_host_address
- instance_smbios11
- api_filtering_extended
- acme_dns01
- security_iommu
- network_ipv4_dhcp_routes
- network_state_ovn_ls
- network_dns_nameservers
- acme_http01_port
- network_ovn_ipv4_dhcp_expiry
- instance_state_cpu_time
- network_io_bus
- disk_io_bus_usb
- storage_driver_linstor
- instance_oci_entrypoint
- network_address_set
- server_logging
- network_forward_snat
- memory_hotplug
- instance_nic_routed_host_tables
- instance_publish_split
- init_preseed_certificates
- custom_volume_sftp
- network_ovn_external_nic_address
- network_physical_gateway_hwaddr
- backup_s3_upload
- snapshot_manual_expiry
- resources_cpu_address_sizes
- disk_attached
- limits_memory_hotplug
- disk_wwn
- server_logging_webhook
- storage_driver_truenas
- container_disk_tmpfs
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
auth_user_name: root
auth_user_method: unix
environment:
  addresses:
  - 172.18.0.2:8443
  architectures:
  - x86_64
  - i686
  certificate: [REDACTED]
  certificate_fingerprint: [REDACTED]
  driver: lxc | qemu
  driver_version: 6.0.5 | 9.0.4
  firewall: nftables
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    idmapped_mounts: "true"
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    seccomp_listener_continue: "true"
    uevent_injection: "true"
    unpriv_binfmt: "true"
    unpriv_fscaps: "true"
  kernel_version: 6.12.43+deb13-amd64
  lxc_features:
    cgroup2: "true"
    core_scheduling: "true"
    devpts_fd: "true"
    idmapped_mounts_v2: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_allow_deny_syntax: "true"
    seccomp_notify: "true"
    seccomp_proxy_send_notify_fd: "true"
  os_name: Debian GNU/Linux
  os_version: "13"
  project: default
  server: incus
  server_clustered: false
  server_event_mode: full-mesh
  server_name: [REDACTED]
  server_pid: 2110
  server_version: "6.16"
  storage: zfs
  storage_version: 2.3.2-2
  storage_supported_drivers:
  - name: lvm
    version: 2.03.31(2) (2025-02-27) / 1.02.205 (2025-02-27) / 4.48.0
    remote: false
  - name: lvmcluster
    version: 2.03.31(2) (2025-02-27) / 1.02.205 (2025-02-27) / 4.48.0
    remote: true
  - name: truenas
    version: 0.7.3
    remote: true
  - name: dir
    version: "1"
    remote: false
  - name: zfs
    version: 2.3.2-2
    remote: false

Instance details

No response

Instance log

No response

Current behavior

After a failed live migrations (similar to #2241) I ended up with two VMs on two hosts: STOPPED on the source host and FROZEN on the second, and a couple of stuck operations:

  • source:
root@m2 ~# in operation list
+--------------------------------------+------+--------------------+---------+------------+----------------------+
|                  ID                  | TYPE |    DESCRIPTION     |  STATE  | CANCELABLE |       CREATED        |
+--------------------------------------+------+--------------------+---------+------------+----------------------+
| 0dc95f7b-1bdc-4460-9623-a734102896b7 | TASK | Migrating instance | RUNNING | NO         | 2025/09/29 08:51 UTC |
+--------------------------------------+------+--------------------+---------+------------+----------------------+
  • target:
root@m1 ~# in operation list
+--------------------------------------+-----------+-------------------+---------+------------+----------------------+
|                  ID                  |   TYPE    |    DESCRIPTION    |  STATE  | CANCELABLE |       CREATED        |
+--------------------------------------+-----------+-------------------+---------+------------+----------------------+
| b40f9c93-1de8-4a13-982f-cbf7ebb7eb9c | WEBSOCKET | Creating instance | RUNNING | NO         | 2025/09/29 08:51 UTC |
+--------------------------------------+-----------+-------------------+---------+------------+----------------------+

These operations are stuck and fail to complete, so I decided to delete them. I have had bad luck deleting operations before after a failed migration (it dropped my VM on both hosts), but this time my VM is disposable, so I went ahead and tried incus operation delete and incus rm -f on both hosts. I managed to remove the source VM (the one in STOPPED state), but all other objects gave me errors:

  1. incus rm -f $frozen_vm on the target:
Error: Stopping the instance failed: Instance is busy running a "create" operation
  1. incus operation delete b40f9c93-1de8-4a13-982f-cbf7ebb7eb9c on the target:
Error: This operation can't be cancelled
  1. incus operation delete 0dc95f7b-1bdc-4460-9623-a734102896b7 on the source:
Error: This operation can't be cancelled

So my VM and both operations are stuck. Previously I got rid of such objects by systemctl restart incus, but it's the last resort because it stops all containers and VMs (which is a problem per se, but not related to this ticket).

Expected behavior

There should be a way to cancel failed migration without restarting the whole daemon including all the containers/VMs.

Steps to reproduce

  1. Create two machines running incus
  2. Run a failing migration (in my case, according to qemu.log for the VM, it was caused by qemu-system-x86_64: Issue while setting TUNSETSTEERINGEBPF: Invalid argument with fd: 36, prog_fd: -1)
  3. Try to clean up operations/VMs

ineu avatar Sep 29 '25 14:09 ineu

We'd need a reliable reproducer for the migration error itself as FROZEN is a pretty unusual state for a VM to be in. Normally a migration failure causes an error on both side which causes the target to get deleted, clearly that didn't happen here, so we'd want to have a way to reproduce that to see what's going on in QEMU.

stgraber avatar Sep 29 '25 17:09 stgraber

Sure, if I manage to reliably reproduce a failing migration, I'll post it. But it's not the migration issue I'm concerned about (there's already #2241), it's operations that can not be canceled. I just thought I can directly edit the sqlite DB, but it doesn't feel safe to me.

For example, the "Migrating instance" operation on the source host: why it must be uncancelable? I've already removed the source VM, so there's not a single chance for the operation to complete. And even if I didn't: there can be reasons to cancel the migration. For example, a user starts the migration and then finds out that there's no enough free space (or RAM) on the target. Or he might make a typo and specify a wrong VM or a wrong target. In either case he should be able to ctrl+c and correct the command. There might be some point in time (probably at the very end of the migration) when it's not feasible to interrupt the process, but at least the whole period when the disk is copied can be interruptible/cancelable.

ineu avatar Sep 29 '25 18:09 ineu

They're not cancelable because there are functions (goroutines) running on the source or target Incus servers which cannot be cancelled. In this case, there may still be a socket connection stuck with QEMU or a filesystem migration connection is still established, ...

All operations go away on daemon restart since at that point any background code will also die.

stgraber avatar Sep 29 '25 18:09 stgraber

An update: I restarted incus on the source host (also updated it from Debian repos, but it's likely unrelated), and rebooted the source host just in case: it removed the stuck operations on both hosts, as well as the FROZEN VM on the target. So I suspect the target cleaned itself up after the source has gone

ineu avatar Sep 29 '25 18:09 ineu

Just discovered, that the machine that was FROZEN on the target and was removed after restarting the source, was actually migrated and started: it's not displayed in the incus list output, but the qemu-system-x86_64 process was running, and the volume exists: I can attach it to the host system and mount partitions.

Though I cannot remove this volume:

root@m1 ~ [1]# in storage volume rm default virtual-machine/jr-gate
Error: Storage volumes of type "virtual-machine" cannot be deleted with the storage API
root@m1 ~ [1]# incus rm jr-gate
Error: Failed checking instance exists "local:jr-gate": Failed to fetch instance "jr-gate" in project "default": Instance not found

ineu avatar Sep 30 '25 10:09 ineu

The target probably was just running QEMU in live migration receive mode and was waiting for the data stream from the source, something went wrong with QEMU which caused the hang/failure. Incus doesn't really see what's going on at the QEMU level in that situation so it was just waiting for things to complete.

When you killed the source, that finally caused the target to try and clean things up but since its QEMU didn't see the failure, you end up with some leftovers...

So yeah, not exactly ideal... Is that something you can still reproduce easily?

Also, you say on two hosts and that's not in a cluster, so are both hosts running the exact same CPU? If not, you're likely to run into some trouble because of that. Clusters compute a baseline virtual CPU to allow for live-migration across diverging systems, but that's not a thing with two standalone servers. Different models from the same vendor within a CPU generation may be okay, but crossing generations or vendors will almost certainly cause issues.

You'd also need to make sure that QEMU is the same version on both ends. Clusters again have a small edge there that they can look at the full config because the instance already exists. We have a volatile key in there which tells us the QEMU machine profile that was used at the time the instance was started. That allows migrating to a newer QEMU without hitting issues. Moving to an older QEMU (especially major versions) may fail.

Anyway, assuming that you're running the same Incus, QEMU and CPU on source and target, further debugging would likely need:

  • incus monitor --pretty running from before the migration attempt until the end of cleanup after you restart the source to unblock things
  • Goroutine dump on source and target servers while they're stuck (incus config set core.debug_address=127.0.0.1:8444 followed by curl http://127.0.0.1:8444/debug/pprof/goroutine?debug=2)
  • ps fauxww on both servers while things are stuck
  • All log files in /var/log/incus/INSTANCE-NAME/ on both source and target (qemu.log and qemu.qmp.log should be the most useful)

stgraber avatar Nov 09 '25 06:11 stgraber

Also, you say on two hosts and that's not in a cluster, so are both hosts running the exact same CPU?

The machines running incus are exactly the same in terms of CPU and memory, and I try to keep them as similar as possible software-wise (they run Debian 13, so incus itself is probably the most frequently updated package there).

Is that something you can still reproduce easily?

I don't do a lot of migrations now, so not easily. When I migrate test machines (not under high load), it always works.

Anyway, assuming that you're running the same Incus, QEMU and CPU on source and target, further debugging would likely need

Thanks for the instructions, I'll use them for the next migration.

Btw, I have disabled overcommit on my hypervisors since when the ticket was created, so maybe the problem was related to the available memory and I won't be able to reproduce it.

ineu avatar Nov 09 '25 18:11 ineu