VM fail to start in Ubuntu 18.04 using 5.0.3 (worked on 5.0.2)
Required information
- Distribution: Ubuntu
- Distribution version: 18.04
- The output of "lxc info" or if that fails:
lxc info
config:
core.proxy_ignore_hosts: 10.131.1.106,10.131.1.171,10.131.1.80,127.0.0.1,::1,localhost
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- rbac
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- clustering_evacuation
- network_ovn_nat_address
- network_bgp
- network_forward
- custom_volume_refresh
- network_counters_errors_dropped
- metrics
- image_source_project
- clustering_config
- network_peer
- linux_sysctl
- network_dns
- ovn_nic_acceleration
- certificate_self_renewal
- instance_project_move
- storage_volume_project_move
- cloud_init
- network_dns_nat
- database_leader
- instance_all_projects
- clustering_groups
- ceph_rbd_du
- instance_get_full
- qemu_metrics
- gpu_mig_uuid
- event_project
- clustering_evacuation_live
- instance_allow_inconsistent_copy
- network_state_ovn
- storage_volume_api_filtering
- image_restrictions
- storage_zfs_export
- network_dns_records
- storage_zfs_reserve_space
- network_acl_log
- storage_zfs_blocksize
- metrics_cpu_seconds
- instance_snapshot_never
- certificate_token
- instance_nic_routed_neighbor_probe
- event_hub
- agent_nic_config
- projects_restricted_intercept
- metrics_authentication
- images_target_project
- cluster_migration_inconsistent_copy
- cluster_ovn_chassis
- container_syscall_intercept_sched_setscheduler
- storage_lvm_thinpool_metadata_size
- storage_volume_state_total
- instance_file_head
- resources_pci_vpd
- qemu_raw_conf
- storage_cephfs_fscache
- vsock_api
- storage_volumes_all_projects
- projects_networks_restricted_access
- cluster_join_token_expiry
- remote_token_expiry
- init_preseed
- cpu_hotplug
- storage_pool_source_wipe
- zfs_block_mode
- instance_generation_id
- disk_io_cache
- storage_pool_loop_resize
- migration_vm_live
- auth_user
- instances_state_total
- numa_cpu_placement
- network_allocations
- storage_api_remote_volume_snapshot_copy
- zfs_delegate
- operations_get_query_all_projects
- event_lifecycle_name_and_project
- instances_nic_limits_priority
- operation_wait
- cluster_internal_custom_volume_copy
- instance_move_config
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
auth_user_name: jenkins
auth_user_method: unix
environment:
addresses: []
architectures:
- x86_64
- i686
certificate: |
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
certificate_fingerprint: 0ae730d159455239ac72e770bab8043ab1817cdd055497cbcebe6ae18af3ca89
driver: lxc | qemu
driver_version: 5.0.3 | 8.0.5
firewall: xtables
kernel: Linux
kernel_architecture: x86_64
kernel_features:
idmapped_mounts: "false"
netnsid_getifaddrs: "false"
seccomp_listener: "false"
seccomp_listener_continue: "false"
shiftfs: "false"
uevent_injection: "false"
unpriv_fscaps: "true"
kernel_version: 4.15.0-221-generic
lxc_features:
cgroup2: "true"
core_scheduling: "true"
devpts_fd: "true"
idmapped_mounts_v2: "true"
mount_injection_file: "true"
network_gateway_device_route: "true"
network_ipvlan: "true"
network_l2proxy: "true"
network_phys_macvlan_mtu: "true"
network_veth_router: "true"
pidfd: "true"
seccomp_allow_deny_syntax: "true"
seccomp_notify: "true"
seccomp_proxy_send_notify_fd: "true"
os_name: Ubuntu
os_version: "18.04"
project: default
server: lxd
server_clustered: false
server_event_mode: full-mesh
server_name: juju-545b99-jenkins-13
server_pid: 14662
server_version: 5.0.3
storage: zfs
storage_version: 0.7.5-1ubuntu16.12
storage_supported_drivers:
- name: btrfs
version: 5.4.1
remote: false
- name: ceph
version: 15.2.17
remote: true
- name: cephfs
version: 15.2.17
remote: true
- name: cephobject
version: 15.2.17
remote: true
- name: dir
version: "1"
remote: false
- name: lvm
version: 2.03.07(2) (2019-11-30) / 1.02.167 (2019-11-30) / 4.37.0
remote: false
- name: zfs
version: 0.7.5-1ubuntu16.12
remote: false
- Kernel version:
- LXC version:
- LXD version:
- Storage backend in use:
Issue description
LXD is unable to start a VM, the cause seems to be that qemu process is killed and shows as [qemu-system-x86] <defunct> in ps aux output.
Steps to reproduce
lxc init --vm ubuntu:focal vm-testlxc start vm-test- command in 2. never finsih and vm-test is not started
Information to attach
- [ ] dmesg
- [ ] Container log (
lxc info NAME --show-log): show-log - [ ] lxc monitor output: lxc-monitor output
- [ ] syslog
- [ ] ps aux | grep qemu:
$ sudo ps aux | grep qemu
root 11584 0.1 0.4 273928 34696 ? Ssl 14:40 0:00 /snap/lxd/26881/bin/qemu-system-x86_64 -S -name vm-test -uuid 26db2276-06e3-48ab-b4d2-828e654606c6 -daemonize -cpu host -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=allow,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/vm-test/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/vm-test/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/vm-test/qemu.pid -D /var/snap/lxd/common/lxd/logs/vm-test/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd
root 11591 0.0 0.0 0 0 ? Zs 14:40 0:00 [qemu-system-x86] <defunct>
root 11593 0.0 0.5 1593764 41584 ? Sl 14:40 0:00 /snap/lxd/26881/bin/qemu-system-x86_64 -S -name vm-test -uuid 26db2276-06e3-48ab-b4d2-828e654606c6 -daemonize -cpu host -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=allow,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/vm-test/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/vm-test/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/vm-test/qemu.pid -D /var/snap/lxd/common/lxd/logs/vm-test/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd
jenkins 13112 0.0 0.0 14860 1064 ? S 14:47 0:00 grep qemu
$ sudo ps aux | grep lxc
jenkins 10719 0.0 0.2 1910800 19144 ? Sl 14:39 0:00 /snap/lxd/26881/bin/lxc monitor --type=logging --pretty
jenkins 13115 0.0 0.0 14860 1080 ? S 14:47 0:00 grep lxc
root 14647 0.0 0.0 152140 272 ? Sl Feb08 0:00 lxcfs /var/snap/lxd/common/var/lib/lxcfs -p /var/snap/lxd/common/lxcfs.pid
Good news (or bad), I can reproduce the issue when running on Bionic (4.15.0-213.224), I get the same defunct QEMU, nothing weird in dmesg.
@verterok switching to the HWE kernel (5.4.0-150-generic) made it work.
If I go back to 4.15 kernel and strace the hung QEMU, here's what I get:
root@hogplum:~# ps fauxZ | tail
unconfined ubuntu 2240 0.0 0.0 76632 7820 ? Ss 22:36 0:00 /lib/systemd/systemd --user
unconfined ubuntu 2241 0.0 0.0 259232 2568 ? S 22:36 0:00 \_ (sd-pam)
unconfined root 3745 0.6 0.0 1849236 32928 ? Ssl 22:37 0:07 /usr/lib/snapd/snapd
unconfined root 6340 0.0 0.0 2620 1688 ? Ss 22:56 0:00 /bin/sh /snap/lxd/27037/commands/daemon.start
unconfined root 6522 1.7 0.2 2653724 137448 ? Sl 22:56 0:01 \_ lxd --logfile /var/snap/lxd/common/lxd/logs/lxd.log --group lxd
lxd_dnsmasq-lxdbr0_</var/snap/lxd/common/lxd> (enforce) lxd 6682 0.4 0.0 9148 4400 ? Ss 22:56 0:00 \_ dnsmasq --keep-in-foreground --strict-order --bind-interfaces --except-interface=lo --pid-file= --no-ping --interface=lxdbr0 --dhcp-rapid-commit --no-negcache --quiet-dhcp --quiet-dhcp6 --quiet-ra --listen-address=10.141.18.1 --dhcp-no-override --dhcp-authoritative --dhcp-leasefile=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.leases --dhcp-hostsfile=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.hosts --dhcp-range 10.141.18.2,10.141.18.254,1h --listen-address=fd42:40ec:efe2:edac::1 --enable-ra --dhcp-range ::,constructor:lxdbr0,ra-stateless,ra-names -s lxd --interface-name _gateway.lxd,lxdbr0 -S /lxd/ --conf-file=/var/snap/lxd/common/lxd/networks/lxdbr0/dnsmasq.raw -u lxd -g lxd
lxd-vm-test_</var/snap/lxd/common/lxd> (enforce) root 6800 0.0 0.0 273928 35656 ? Ssl 22:56 0:00 \_ /snap/lxd/27037/bin/qemu-system-x86_64 -S -name vm-test -uuid bfbb0567-525b-44e1-b99a-03680aae105e -daemonize -cpu host -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=allow,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/vm-test/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/vm-test/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/vm-test/qemu.pid -D /var/snap/lxd/common/lxd/logs/vm-test/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd
lxd-vm-test_</var/snap/lxd/common/lxd> (enforce) root 6807 0.0 0.0 0 0 ? Zs 22:56 0:00 \_ [qemu-system-x86] <defunct>
unconfined root 6510 0.0 0.0 152136 2064 ? Sl 22:56 0:00 lxcfs /var/snap/lxd/common/var/lib/lxcfs -p /var/snap/lxd/common/lxcfs.pid
lxd-vm-test_</var/snap/lxd/common/lxd> (enforce) root 6809 0.0 0.0 1593788 41688 ? Sl 22:56 0:00 /snap/lxd/27037/bin/qemu-system-x86_64 -S -name vm-test -uuid bfbb0567-525b-44e1-b99a-03680aae105e -daemonize -cpu host -nographic -serial chardev:console -nodefaults -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=allow,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/vm-test/qemu.conf -spice unix=on,disable-ticketing=on,addr=/var/snap/lxd/common/lxd/logs/vm-test/qemu.spice -pidfile /var/snap/lxd/common/lxd/logs/vm-test/qemu.pid -D /var/snap/lxd/common/lxd/logs/vm-test/qemu.log -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd
root@hogplum:~# timeout 30 strace -p 6809
strace: Process 6809 attached
recvmsg(20, strace: Process 6809 detached
<detached ...>
And fd 20 is:
root@hogplum:~# ll /proc/6809/fd
total 0
dr-x------ 2 root root 0 Feb 16 22:56 ./
dr-xr-xr-x 9 root root 0 Feb 16 22:56 ../
lr-x------ 1 root root 64 Feb 16 22:59 0 -> /dev/null
lrwx------ 1 root root 64 Feb 16 22:59 1 -> /var/snap/lxd/common/lxd/logs/vm-test/qemu.early.log
lrwx------ 1 root root 64 Feb 16 22:59 10 -> 'anon_inode:[eventpoll]'
lrwx------ 1 root root 64 Feb 16 22:59 11 -> 'socket:[44153]'
lrwx------ 1 root root 64 Feb 16 22:59 12 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Feb 16 22:59 13 -> 'anon_inode:[eventfd]'
lr-x------ 1 root root 64 Feb 16 22:59 14 -> /dev/urandom
lrwx------ 1 root root 64 Feb 16 22:59 15 -> 'socket:[41606]'
lrwx------ 1 root root 64 Feb 16 22:59 16 -> 'socket:[41607]'
lrwx------ 1 root root 64 Feb 16 22:59 17 -> 'socket:[41608]'
lrwx------ 1 root root 64 Feb 16 22:59 18 -> 'socket:[41609]'
lrwx------ 1 root root 64 Feb 16 22:59 19 -> 'socket:[41610]'
l-wx------ 1 root root 64 Feb 16 22:59 2 -> /var/snap/lxd/common/lxd/logs/vm-test/qemu.log
lrwx------ 1 root root 64 Feb 16 22:59 20 -> 'socket:[41611]'
lr-x------ 1 root root 64 Feb 16 22:59 21 -> /snap/lxd/27037/share/qemu/OVMF_CODE.fd
lrwx------ 1 root root 64 Feb 16 22:59 22 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Feb 16 22:59 23 -> /dev/kvm
lrwx------ 1 root root 64 Feb 16 22:59 24 -> anon_inode:kvm-vm
lrwx------ 1 root root 64 Feb 16 22:59 25 -> '/memfd:memory-backend-memfd (deleted)'
lrwx------ 1 root root 64 Feb 16 22:59 26 -> 'anon_inode:[eventpoll]'
lrwx------ 1 root root 64 Feb 16 22:59 27 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Feb 16 22:59 28 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Feb 16 22:59 29 -> anon_inode:kvm-vcpu
lrwx------ 1 root root 64 Feb 16 22:59 3 -> /dev/vhost-vsock
lrwx------ 1 root root 64 Feb 16 22:59 30 -> /var/snap/lxd/common/lxd/storage-pools/default/virtual-machines/vm-test/OVMF_VARS.ms.fd
lrwx------ 1 root root 64 Feb 16 22:59 31 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Feb 16 22:59 32 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Feb 16 22:59 33 -> 'anon_inode:[eventfd]'
lr-x------ 1 root root 64 Feb 16 22:59 34 -> /var/snap/lxd/common/lxd/devices/vm-test/config.mount/
lr-x------ 1 root root 64 Feb 16 22:56 4 -> /var/snap/lxd/common/lxd/storage-pools/default/virtual-machines/vm-test/OVMF_VARS.ms.fd
lrwx------ 1 root root 64 Feb 16 22:59 5 -> 'anon_inode:[eventpoll]'
lrwx------ 1 root root 64 Feb 16 22:59 6 -> 'anon_inode:[eventfd]'
l-wx------ 1 root root 64 Feb 16 22:59 7 -> /var/snap/lxd/common/lxd/logs/vm-test/qemu.pid
l-wx------ 1 root root 64 Feb 16 22:59 8 -> 'pipe:[47138]'
lrwx------ 1 root root 64 Feb 16 22:59 9 -> 'anon_inode:[signalfd]'
@simondeziel thanks for looking into this! Indeed, using the HWE kernel is our current workaround and we will plan moving the instances to focal
cheers!
5.21/stable and 5.21/edge also present the same problem. Tried to change the QEMU version on the snap, neither v8.0.5(used in LXD 5.0.3) or v8.0.3 work but when reverting to v7.1.0 (LXD 5.0.2 version) VM start works again. I also tried QEMU v9.0.1 and that also works, apparently a fix for whatever is happening here was introduced on their side after v8.2.1.
@hamistao just for the record, what kernel version are you using (uname -a)?
I am using 4.15.0-213-generic
I am using
4.15.0-213-generic
This kernel is from 2023-06-28. Surely there is one newer available in the ESM repos. I think we should try an see if the latest supported kernel from the ESM repos is also affected by the regression on QEMU 8.0.5 (from LXD 5.0.3).
It's a long shot but I think it could have to do with our use of Hyper-V features. Here's a workaround that make it work with a 4.15 kernel and QEMU 8.0.5:
root@hardhat:~# uname -a
Linux hardhat 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
root@hardhat:~# snap list lxd
Name Version Rev Tracking Publisher Notes
lxd 5.0.3-d921d2e 28373 5.0/stable canonical✓ -
root@hardhat:~# lxc info | grep -A1 -w qemu
driver: lxc | qemu
driver_version: 5.0.3 | 8.0.5
root@hardhat:~# lxc launch ubuntu:focal vm-test --vm -d root,size.state=2GiB -c migration.stateful=true
root@hardhat:~# sleep 60
root@hardhat:~# IP=$(lxc list -c4 -f csv vm-test | cut -d\ -f1)
root@hardhat:~# echo $IP
10.156.110.81
root@hardhat:~# ping $IP
PING 10.156.110.81 (10.156.110.81) 56(84) bytes of data.
64 bytes from 10.156.110.81: icmp_seq=1 ttl=64 time=0.327 ms
64 bytes from 10.156.110.81: icmp_seq=2 ttl=64 time=0.255 ms
^C
--- 10.156.110.81 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1005ms
rtt min/avg/max/mdev = 0.255/0.291/0.327/0.036 ms
The migration.stateful=true was used as a proxy to turn off the Hyper-V feature. https://gitlab.com/qemu-project/qemu/-/commit/9b98ab7d3d44911552063cfa3863b67ab79ef783 is what tipped me to try this. This specific commit landed in QEMU 9.0.1 (tested to be working by @hamistao) and also in 8.2.5.
Now I really don't know if that commit has anything to do with the regression, it's just a hunch.
5.21/stable and 5.21/edge also present the same problem. Tried to change the QEMU version on the snap, neither v8.0.5(used in LXD 5.0.3) or v8.0.3 work but when reverting to v7.1.0 (LXD 5.0.2 version) VM start works again. I also tried QEMU v9.0.1 and that also works, apparently a fix for whatever is happening here was introduced on their side after v8.2.1.
Please can you show how you confirmed that v7.1.0 and v9.0.1 worked?
As I have rebuilt 5.0/edge (core22 based) using these 2 versions and I get the same issue still.
I think the hyperv thing is a misnomer (I did try the patch to no avail btw) as that feature is already gated by a kernel version check >= 5.10.0 so on a 4.15 kernel it wont be being used.
I confirmed that setting migration.stateful=true allows the VM to start, so infact it turns out to be virtiofs config drive that is the difference in the qemu.conf file:
< # Config drive (virtio-fs)
< [chardev "qemu_config"]
< backend = "socket"
< path = "/var/snap/lxd/common/lxd/logs/v1/virtio-fs.config.sock"
<
< [device "dev-qemu_config-drive-virtio-fs"]
< driver = "vhost-user-fs-pci"
< bus = "qemu_pcie2"
< addr = "00.1"
< tag = "config"
< chardev = "qemu_config"
<
so infact it turns out to be virtiofs config drive that is the difference in the qemu.conf file
Then that would probably be the reason my tests worked then, I primed virtiofsd from bin/virtiofsd on qemu.
Then that would probably be the reason my tests worked then, I primed virtiofsd from bin/virtiofsd on qemu.
In which version? Please show your reproducer steps and file changes to help others continue this debugging.
Because 8.0 doesn't come with virtiofsd i believe, hence the separate section to build it.
Looks to me like virtiofsd isn't starting or is starting and then stopping, but leaving behind virtio-fs.config.sock which is then used as an indicator that its available to qemu, so qemu then writes its config file to use it, and so qemu then blocks on that socket as nothing is connected to it.
/var/lib/snapd/hostfs/snap/lxd/current/bin/virtiofsd -o source=/var/snap/lxd/common/lxd/devices/v1/config.mount --socket-path=/var/snap/lxd/common/lxd/logs/v1/virtio-fs.config.sock
[2024-07-19T12:30:19Z WARN virtiofsd] Use of deprecated option format '-o': Please specify options without it (e.g., '--cache auto' instead of '-o cache=auto')
[2024-07-19T12:30:19Z ERROR virtiofsd] Error entering sandbox: Fork(Os { code: 38, kind: Unsupported, message: "Function not implemented" })
/var/lib/snapd/hostfs/snap/lxd/current/bin/virtiofsd -o source=/var/snap/lxd/common/lxd/devices/v1/config.mount --socket-path=/var/snap/lxd/common/lxd/logs/v1/virtio-fs.config.sock --sandbox=namespace
[2024-07-19T12:31:42Z WARN virtiofsd] Use of deprecated option format '-o': Please specify options without it (e.g., '--cache auto' instead of '-o cache=auto')
[2024-07-19T12:31:42Z ERROR virtiofsd] Error entering sandbox: Fork(Os { code: 38, kind: Unsupported, message: "Function not implemented" })
/var/lib/snapd/hostfs/snap/lxd/current/bin/virtiofsd -o source=/var/snap/lxd/common/lxd/devices/v1/config.mount --socket-path=/var/snap/lxd/common/lxd/logs/v1/virtio-fs.config.sock --sandbox=chroot
[2024-07-19T12:31:59Z WARN virtiofsd] Use of deprecated option format '-o': Please specify options without it (e.g., '--cache auto' instead of '-o cache=auto')
[2024-07-19T12:31:59Z INFO virtiofsd] Waiting for vhost-user socket connection...
OK so the issue is with running virtiofsd with namespace sandbox mode, doesn't seem to be supported on 4.15 kernels.
Yep that made it work
@tomponline Are my steps and files still necessary?
@tomponline Are my steps and files still necessary?
Well I was wondering how you used the non-rust virtiofsd from qemu when its not provided anymore from 8.x (I forget) onwards?
Likely, this is a place where it fails https://gitlab.com/virtio-fs/virtiofsd/-/blob/main/src/util.rs#L64 pidfd_open is not available on 4.15 kernel
Thanks for your help.
We decided to switch to using --sandbox=chroot for 5.0 series to restore compatibility with 4.15 kernel.
Well I was wondering how you used the non-rust virtiofsd from qemu when its not provided anymore from 8.x (I forget) onwards?
I didn't do anything complicated, just commented the virtiofsd section because snapcraft kept complaining about there being two virtiofsd binaries. Here are the changes I made. It was been some time and my notes are a little scrambled, but what I sent is at least very close to what I used. If needed I can take some time to redo my tests.
Well I was wondering how you used the non-rust virtiofsd from qemu when its not provided anymore from 8.x (I forget) onwards?
I didn't do anything complicated, just commented the virtiofsd section because snapcraft kept complaining about there being two virtiofsd binaries. Here are the changes I made. It was been some time and my notes are a little scrambled, but what I sent is at least very close to what I used. If needed I can take some time to redo my tests.
So it got removed in 8.0 see https://wiki.qemu.org/ChangeLog/8.0 (search for virtiofsd). I wonder if you initially tried to build 7.1.0 and hit the duplicate file issue because it did bundle it back then, and that version didn't use pidfd and so worked, and then you left the virtiofsd section commented out for the 9.0.1 build which resulted in no virtiofsd being built at all and so lxd would have detected this and fallen back to using 9p instead.
Just redid some tests and that seems to have been the case. I didn't consider 9p, so I thought it would fail if no virtiofsd was present. Sorry for the confusion.
OK thanks.
So to summarise:
4.15.0-213.224kernel didn't work.5.4.0-150-genericdid work.- Enabling
migration.stateful=trueon4.15.0-213.224kernel did work. This lead to the incorrect theory that it may be related tomigration.statefuldisabling the Hypervhv_passthroughCPU feature (because that feature is also gated by a kernel check for 5.10.0 or newer and so was never enabled on 4.15 kernels). It was not identified at this stage thatmigration.stateful=truealso disabled virtiofsd shares. - Following this LXD snap was rebuilt using QEMU v7.1.0, which required removing the newer external rust based virtiofsd build because QEMU 7.1.0 came with its own older virtiofsd (whose namespace sandbox mode didn't require pidfd_open syscall support). This was found to work (but at this stage the old virtiofsd difference had not been identitifed).
- Then LXD snap was rebuilt using QEMU v9.0.1, but critically, the external virtiofsd was not re-enabled in the build. This caused virtiofsd to be missing entirely from the snap and so LXD was falling back to 9p mode only, meaning that VMs then started, which was causing the incorrect perception that QEMU v9.0.1 fixed the issue.
- Then after escalation the gated
hv_passthroughCPU feature and the fact thatmigration.stateful=truedisabled virtiofsd shares was identified. A new snap build was made that tried QEMU v9.0.1 with the external virtiofsd re-enabled and the issue returned. Ruling out that a fix appeared in that release. This turned focus to virtiofsd being the issue and not QEMU. - By entering the LXD snap's mount namespace and trying to run virtiofsd directly, the actual issue was identified as
[ERROR virtiofsd] Error entering sandbox: Fork(Os { code: 38, kind: Unsupported, message: "Function not implemented" }). - It was then identified that
--sandbox=chrootworked on 4.15 kernels, and that--sandbox=namespace(the default) required thepidfd_opensyscall that was not added until 5.3 kernels. - It was decided to add a fix to LXD to use
--sandbox=chrootto start virtiofsd on pre 5.3 kernels.