lxd icon indicating copy to clipboard operation
lxd copied to clipboard

SNAP LXD Services Hangs on Startup

Open declay opened this issue 1 year ago • 3 comments

After reboots the same running containers start, but none of the rest start and the LXD service never comes up to interact with the containers. It's simply hanging forever. More details below. (I've changed the names of the real containers).

$ lxc list +--------------------+---------+----------------------+------+-----------+-----------+ | NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS | +--------------------+---------+----------------------+------+-----------+-----------+ | xxx-yyy | RUNNING | 172.17.0.1 (docker0) | | CONTAINER | 0 | | | | 10.15.76.36 (eth0) | | | | +--------------------+---------+----------------------+------+-----------+-----------+ | xxxd-dd | RUNNING | 172.17.0.1 (docker0) | | CONTAINER | 0 | | | | 10.15.76.151 (eth0) | | | | | | | 10.100.0.2 (wg0) | | | | +--------------------+---------+----------------------+------+-----------+-----------+ | xxx-zzz | RUNNING | 172.17.0.1 (docker0) | | CONTAINER | 0 | | | | 10.15.76.130 (eth0) | | | | | | | 10.100.0.2 (wg0) | | | | +--------------------+---------+----------------------+------+-----------+-----------+ | xxx-aaa RUNNING | 172.17.0.1 (docker0) | | CONTAINER | 0 | | | | 10.15.76.190 (eth0) | | | | +--------------------+---------+----------------------+------+-----------+-----------+ | xxx-bbb | STOPPED | | | CONTAINER | 0 | +--------------------+---------+----------------------+------+-----------+-----------+ | xxx-ccc | STOPPED | | | CONTAINER | 0 | +--------------------+---------+----------------------+------+-----------+-----------+ | test | STOPPED | | | CONTAINER | 0 | +--------------------+---------+----------------------+------+-----------+-----------+ | xxx-000 | STOPPED | | | CONTAINER | 0 | +--------------------+---------+----------------------+------+-----------+-----------+ | xxx-878 | STOPPED | | | CONTAINER | 0 | +--------------------+---------+----------------------+------+-----------+-----------+

I"m basically dead in the water because I can't interact with anything. lxc list and other GET type commands do work. But those that mutate anything just hang.

Required information

  • Distribution: Ubuntu

  • Distribution version: 22.04

  • The output of "snap list --all lxd core20 core22 core24 snapd": Name Version Rev Tracking Publisher Notes core20 20230622 1974 latest/stable canonical✓ base,disabled core20 20240416 2318 latest/stable canonical✓ base core22 20240419 1439 latest/stable canonical✓ base,disabled core22 20240731 1564 latest/stable canonical✓ base lxd 6.1-efad198 29943 6.1/stable canonical✓ disabled lxd 6.1-c14927a 29551 6.1/stable canonical✓ - snapd 2.59.5 19457 latest/stable canonical✓ snapd,disabled snapd 2.63 21759 latest/stable canonical✓ snapd

  • The output of "lxc info" or if that fails:

    • Kernel version: config: {} api_extensions:
  • storage_zfs_remove_snapshots

  • container_host_shutdown_timeout

  • container_stop_priority

  • container_syscall_filtering

  • auth_pki

  • container_last_used_at

  • etag

  • patch

  • usb_devices

  • https_allowed_credentials

  • image_compression_algorithm

  • directory_manipulation

  • container_cpu_time

  • storage_zfs_use_refquota

  • storage_lvm_mount_options

  • network

  • profile_usedby

  • container_push

  • container_exec_recording

  • certificate_update

  • container_exec_signal_handling

  • gpu_devices

  • container_image_properties

  • migration_progress

  • id_map

  • network_firewall_filtering

  • network_routes

  • storage

  • file_delete

  • file_append

  • network_dhcp_expiry

  • storage_lvm_vg_rename

  • storage_lvm_thinpool_rename

  • network_vlan

  • image_create_aliases

  • container_stateless_copy

  • container_only_migration

  • storage_zfs_clone_copy

  • unix_device_rename

  • storage_lvm_use_thinpool

  • storage_rsync_bwlimit

  • network_vxlan_interface

  • storage_btrfs_mount_options

  • entity_description

  • image_force_refresh

  • storage_lvm_lv_resizing

  • id_map_base

  • file_symlinks

  • container_push_target

  • network_vlan_physical

  • storage_images_delete

  • container_edit_metadata

  • container_snapshot_stateful_migration

  • storage_driver_ceph

  • storage_ceph_user_name

  • resource_limits

  • storage_volatile_initial_source

  • storage_ceph_force_osd_reuse

  • storage_block_filesystem_btrfs

  • resources

  • kernel_limits

  • storage_api_volume_rename

  • network_sriov

  • console

  • restrict_devlxd

  • migration_pre_copy

  • infiniband

  • maas_network

  • devlxd_events

  • proxy

  • network_dhcp_gateway

  • file_get_symlink

  • network_leases

  • unix_device_hotplug

  • storage_api_local_volume_handling

  • operation_description

  • clustering

  • event_lifecycle

  • storage_api_remote_volume_handling

  • nvidia_runtime

  • container_mount_propagation

  • container_backup

  • devlxd_images

  • container_local_cross_pool_handling

  • proxy_unix

  • proxy_udp

  • clustering_join

  • proxy_tcp_udp_multi_port_handling

  • network_state

  • proxy_unix_dac_properties

  • container_protection_delete

  • unix_priv_drop

  • pprof_http

  • proxy_haproxy_protocol

  • network_hwaddr

  • proxy_nat

  • network_nat_order

  • container_full

  • backup_compression

  • nvidia_runtime_config

  • storage_api_volume_snapshots

  • storage_unmapped

  • projects

  • network_vxlan_ttl

  • container_incremental_copy

  • usb_optional_vendorid

  • snapshot_scheduling

  • snapshot_schedule_aliases

  • container_copy_project

  • clustering_server_address

  • clustering_image_replication

  • container_protection_shift

  • snapshot_expiry

  • container_backup_override_pool

  • snapshot_expiry_creation

  • network_leases_location

  • resources_cpu_socket

  • resources_gpu

  • resources_numa

  • kernel_features

  • id_map_current

  • event_location

  • storage_api_remote_volume_snapshots

  • network_nat_address

  • container_nic_routes

  • cluster_internal_copy

  • seccomp_notify

  • lxc_features

  • container_nic_ipvlan

  • network_vlan_sriov

  • storage_cephfs

  • container_nic_ipfilter

  • resources_v2

  • container_exec_user_group_cwd

  • container_syscall_intercept

  • container_disk_shift

  • storage_shifted

  • resources_infiniband

  • daemon_storage

  • instances

  • image_types

  • resources_disk_sata

  • clustering_roles

  • images_expiry

  • resources_network_firmware

  • backup_compression_algorithm

  • ceph_data_pool_name

  • container_syscall_intercept_mount

  • compression_squashfs

  • container_raw_mount

  • container_nic_routed

  • container_syscall_intercept_mount_fuse

  • container_disk_ceph

  • virtual-machines

  • image_profiles

  • clustering_architecture

  • resources_disk_id

  • storage_lvm_stripes

  • vm_boot_priority

  • unix_hotplug_devices

  • api_filtering

  • instance_nic_network

  • clustering_sizing

  • firewall_driver

  • projects_limits

  • container_syscall_intercept_hugetlbfs

  • limits_hugepages

  • container_nic_routed_gateway

  • projects_restrictions

  • custom_volume_snapshot_expiry

  • volume_snapshot_scheduling

  • trust_ca_certificates

  • snapshot_disk_usage

  • clustering_edit_roles

  • container_nic_routed_host_address

  • container_nic_ipvlan_gateway

  • resources_usb_pci

  • resources_cpu_threads_numa

  • resources_cpu_core_die

  • api_os

  • container_nic_routed_host_table

  • container_nic_ipvlan_host_table

  • container_nic_ipvlan_mode

  • resources_system

  • images_push_relay

  • network_dns_search

  • container_nic_routed_limits

  • instance_nic_bridged_vlan

  • network_state_bond_bridge

  • usedby_consistency

  • custom_block_volumes

  • clustering_failure_domains

  • resources_gpu_mdev

  • console_vga_type

  • projects_limits_disk

  • network_type_macvlan

  • network_type_sriov

  • container_syscall_intercept_bpf_devices

  • network_type_ovn

  • projects_networks

  • projects_networks_restricted_uplinks

  • custom_volume_backup

  • backup_override_name

  • storage_rsync_compression

  • network_type_physical

  • network_ovn_external_subnets

  • network_ovn_nat

  • network_ovn_external_routes_remove

  • tpm_device_type

  • storage_zfs_clone_copy_rebase

  • gpu_mdev

  • resources_pci_iommu

  • resources_network_usb

  • resources_disk_address

  • network_physical_ovn_ingress_mode

  • network_ovn_dhcp

  • network_physical_routes_anycast

  • projects_limits_instances

  • network_state_vlan

  • instance_nic_bridged_port_isolation

  • instance_bulk_state_change

  • network_gvrp

  • instance_pool_move

  • gpu_sriov

  • pci_device_type

  • storage_volume_state

  • network_acl

  • migration_stateful

  • disk_state_quota

  • storage_ceph_features

  • projects_compression

  • projects_images_remote_cache_expiry

  • certificate_project

  • network_ovn_acl

  • projects_images_auto_update

  • projects_restricted_cluster_target

  • images_default_architecture

  • network_ovn_acl_defaults

  • gpu_mig

  • project_usage

  • network_bridge_acl

  • warnings

  • projects_restricted_backups_and_snapshots

  • clustering_join_token

  • clustering_description

  • server_trusted_proxy

  • clustering_update_cert

  • storage_api_project

  • server_instance_driver_operational

  • server_supported_storage_drivers

  • event_lifecycle_requestor_address

  • resources_gpu_usb

  • clustering_evacuation

  • network_ovn_nat_address

  • network_bgp

  • network_forward

  • custom_volume_refresh

  • network_counters_errors_dropped

  • metrics

  • image_source_project

  • clustering_config

  • network_peer

  • linux_sysctl

  • network_dns

  • ovn_nic_acceleration

  • certificate_self_renewal

  • instance_project_move

  • storage_volume_project_move

  • cloud_init

  • network_dns_nat

  • database_leader

  • instance_all_projects

  • clustering_groups

  • ceph_rbd_du

  • instance_get_full

  • qemu_metrics

  • gpu_mig_uuid

  • event_project

  • clustering_evacuation_live

  • instance_allow_inconsistent_copy

  • network_state_ovn

  • storage_volume_api_filtering

  • image_restrictions

  • storage_zfs_export

  • network_dns_records

  • storage_zfs_reserve_space

  • network_acl_log

  • storage_zfs_blocksize

  • metrics_cpu_seconds

  • instance_snapshot_never

  • certificate_token

  • instance_nic_routed_neighbor_probe

  • event_hub

  • agent_nic_config

  • projects_restricted_intercept

  • metrics_authentication

  • images_target_project

  • cluster_migration_inconsistent_copy

  • cluster_ovn_chassis

  • container_syscall_intercept_sched_setscheduler

  • storage_lvm_thinpool_metadata_size

  • storage_volume_state_total

  • instance_file_head

  • instances_nic_host_name

  • image_copy_profile

  • container_syscall_intercept_sysinfo

  • clustering_evacuation_mode

  • resources_pci_vpd

  • qemu_raw_conf

  • storage_cephfs_fscache

  • network_load_balancer

  • vsock_api

  • instance_ready_state

  • network_bgp_holdtime

  • storage_volumes_all_projects

  • metrics_memory_oom_total

  • storage_buckets

  • storage_buckets_create_credentials

  • metrics_cpu_effective_total

  • projects_networks_restricted_access

  • storage_buckets_local

  • loki

  • acme

  • internal_metrics

  • cluster_join_token_expiry

  • remote_token_expiry

  • init_preseed

  • storage_volumes_created_at

  • cpu_hotplug

  • projects_networks_zones

  • network_txqueuelen

  • cluster_member_state

  • instances_placement_scriptlet

  • storage_pool_source_wipe

  • zfs_block_mode

  • instance_generation_id

  • disk_io_cache

  • amd_sev

  • storage_pool_loop_resize

  • migration_vm_live

  • ovn_nic_nesting

  • oidc

  • network_ovn_l3only

  • ovn_nic_acceleration_vdpa

  • cluster_healing

  • instances_state_total

  • auth_user

  • security_csm

  • instances_rebuild

  • numa_cpu_placement

  • custom_volume_iso

  • network_allocations

  • storage_api_remote_volume_snapshot_copy

  • zfs_delegate

  • operations_get_query_all_projects

  • metadata_configuration

  • syslog_socket

  • event_lifecycle_name_and_project

  • instances_nic_limits_priority

  • disk_initial_volume_configuration

  • operation_wait

  • cluster_internal_custom_volume_copy

  • disk_io_bus

  • storage_cephfs_create_missing

  • instance_move_config

  • ovn_ssl_config

  • init_preseed_storage_volumes

  • metrics_instances_count

  • server_instance_type_info

  • resources_disk_mounted

  • server_version_lts

  • oidc_groups_claim

  • loki_config_instance

  • storage_volatile_uuid

  • import_instance_devices

  • instances_uefi_vars

  • instances_migration_stateful

  • container_syscall_filtering_allow_deny_syntax

  • access_management

  • vm_disk_io_limits

  • storage_volumes_all

  • instances_files_modify_permissions

  • image_restriction_nesting

  • container_syscall_intercept_finit_module

  • device_usb_serial

  • network_allocate_external_ips

  • explicit_trust_token api_status: stable api_version: "1.0" auth: trusted public: false auth_methods:

  • tls auth_user_name: zadmin auth_user_method: unix environment: addresses: [] architectures:

    • x86_64
    • i686 certificate: | -----BEGIN CERTIFICATE----- MIIB4DCCAWagAwIBAgIQGKf6QO+LRBtazcRgzGf73jAKBggqhkjOPQQDAzAjMQww CgYDVQQKEwNMWEQxEzARBgNVBAMMCnJvb3RAb3Jpb24wHhcNMjQwNjA0MTcxMzM5 WhcNMzQwNjAyMTcxMzM5WjAjMQwwCgYDVQQKEwNMWEQxEzARBgNVBAMMCnJvb3RA b3Jpb24wdjAQBgcqhkjOPQIBBgUrgQQAIgNiAARAf3DmUWEXKVThckWCogYH+h9L 93QidA1lxvngrMZ1nLyzkZ8FC0e5jWxpoRV/EQE0tQJzvBXo+3R3F7io8Xw/ATAO 8MfNeMx1dD1ZHXB4DQadWKuCX+dhIuGvbiHecvijXzBdMA4GA1UdDwEB/wQEAwIF oDATBgNVHSUEDDAKBggrBgEFBQcDATAMBgNVHRMBAf8EAjAAMCgGA1UdEQQhMB+C BW9yaW9uhwR/AAABhxAAAAAAAAAAAAAAAAAAAAABMAoGCCqGSM49BAMDA2gAMGUC MQDiVn1eTwkiSFWYYmFrTRM45dzIyF5O66mce2tDeibpRlEB7H3BowYKwlrC7o7N iOMCMAaBu+GtgV5QagCnK+OzockZ6JZR+iOyKTGmdovUVBgKau3b7PDt/d6SrYlq r8HALg== -----END CERTIFICATE----- certificate_fingerprint: b377aa0a3b291c1ca82b1dfcfdf1164786ddaecdea42e1963e51c53cf8a3d2d5 driver: lxc | qemu driver_version: 6.0.0 | 8.2.1 instance_types:
    • container
    • virtual-machine firewall: nftables kernel: Linux kernel_architecture: x86_64 kernel_features: idmapped_mounts: "true" netnsid_getifaddrs: "true" seccomp_listener: "true" seccomp_listener_continue: "true" uevent_injection: "true" unpriv_fscaps: "true" kernel_version: 5.15.0-119-generic lxc_features: cgroup2: "true" core_scheduling: "true" devpts_fd: "true" idmapped_mounts_v2: "true" mount_injection_file: "true" network_gateway_device_route: "true" network_ipvlan: "true" network_l2proxy: "true" network_phys_macvlan_mtu: "true" network_veth_router: "true" pidfd: "true" seccomp_allow_deny_syntax: "true" seccomp_notify: "true" seccomp_proxy_send_notify_fd: "true" os_name: Ubuntu os_version: "22.04" project: default server: lxd server_clustered: false server_event_mode: full-mesh server_name: orion server_pid: 22856 server_version: "6.1" server_lts: false storage: dir | lvm storage_version: 1 | 2.03.11(2) (2021-01-08) / 1.02.175 (2021-01-08) / 4.45.0 storage_supported_drivers:
    • name: powerflex version: 1.16 (nvme-cli) remote: true
    • name: zfs version: 2.1.5-1ubuntu6~22.04.4 remote: false
    • name: btrfs version: 5.16.2 remote: false
    • name: ceph version: 17.2.7 remote: true
    • name: cephfs version: 17.2.7 remote: true
    • name: cephobject version: 17.2.7 remote: true
    • name: dir version: "1" remote: false
    • name: lvm version: 2.03.11(2) (2021-01-08) / 1.02.175 (2021-01-08) / 4.45.0 remote: false
    • Storage backend in use:

Issue description

A brief description of the problem. Should include what you were attempting to do, what you did, what happened and what you expected to see happen.

Yesterday after a reboot, my LXD standalone server stopped working in part. The main LXD service will not start and sticks in a perpetual hang, making it impossible to interact with my instances using lxc mutations. I've rebooted many times and about half the containers autostart, but half do not. I can do nothing afterwards because lxc start and related commands simply hang. Here's the listing state of my environment (non-clustered):

Steps to reproduce

Simply reboot my server. Snap services come up but lxd just hangs. Some containers launch, some don't.

Back end of OS containers is local SSD. Some containers have LVM volumes attached.

Information to attach

  • [ ] Any relevant kernel output (dmesg)
  • [ ] Container log (lxc info NAME --show-log)
  • [ ] Container configuration (lxc config show NAME --expanded)
  • [ ] Main daemon log (at /var/log/lxd/lxd.log or /var/snap/lxd/common/lxd/logs/lxd.log)
  • [ ] root@orion:/var/snap/lxd/common/lxd/logs# cat lxd.log time="2024-08-23T04:06:05Z" level=warning msg=" - Couldn't find the CGroup network priority controller, per-instance network priority will be ignored. Please use per-device limits.priority instead" time="2024-08-23T04:06:18Z" level=error msg="Failed writing error for HTTP response" err="write unix /var/snap/lxd/common/lxd/unix.socket->@: write: broken pipe" url=/1.0 writeErr="write unix /var/snap/lxd/common/lxd/unix.socket->@: write: broken pipe" time="2024-08-23T17:00:09Z" level=error msg="Requestor process creds lacks CAP_MKNOD" instance=storj-node-11 project=default

This is the core of the problem with LXD never going completing the Start.

● snap.lxd.daemon.service - Service for snap application lxd.daemon Loaded: loaded (/etc/systemd/system/snap.lxd.daemon.service; static) Active: active (running) since Fri 2024-08-23 04:06:02 UTC; 13h ago TriggeredBy: ● snap.lxd.daemon.unix.socket Main PID: 22678 (daemon.start) Tasks: 0 (limit: 38283) Memory: 9.2M CPU: 849ms CGroup: /system.slice/snap.lxd.daemon.service ‣ 22678 /bin/sh /snap/lxd/29551/commands/daemon.start

Aug 23 04:06:04 orion lxd.daemon[22835]: - proc_slabinfo Aug 23 04:06:04 orion lxd.daemon[22835]: - shared_pidns Aug 23 04:06:04 orion lxd.daemon[22835]: - cpuview_daemon Aug 23 04:06:04 orion lxd.daemon[22835]: - loadavg_daemon Aug 23 04:06:04 orion lxd.daemon[22835]: - pidfds Aug 23 04:06:05 orion lxd.daemon[22678]: => Killing conflicting LXD (pid=16356) Aug 23 04:06:05 orion lxd.daemon[22678]: => Starting LXD Aug 23 04:06:05 orion lxd.daemon[22856]: time="2024-08-23T04:06:05Z" level=warning msg=" - Couldn't find the CGroup network priority controller, per-instance network priority> Aug 23 04:06:18 orion lxd.daemon[22856]: time="2024-08-23T04:06:18Z" level=error msg="Failed writing error for HTTP response" err="write unix /var/snap/lxd/common/lxd/unix.so> Aug 23 17:00:09 orion lxd.daemon[22856]: time="2024-08-23T17:00:09Z" level=error msg="Requestor process creds lacks CAP_MKNOD" instance=storj-node-11 project=default

  • [ ] Output of the client with --debug Here's what starting a container does. It just hangs. lxc start test --debug DEBUG [2024-08-23T17:35:41Z] Connecting to a local LXD over a Unix socket DEBUG [2024-08-23T17:35:41Z] Sending request to LXD etag= method=GET url="http://unix.socket/1.0" DEBUG [2024-08-23T17:35:41Z] Got response struct from LXD
    DEBUG [2024-08-23T17:35:41Z] { "config": {}, "api_extensions": [ "storage_zfs_remove_snapshots", "container_host_shutdown_timeout", "container_stop_priority", "container_syscall_filtering", "auth_pki", "container_last_used_at", "etag", "patch", "usb_devices", "https_allowed_credentials", "image_compression_algorithm", "directory_manipulation", "container_cpu_time", "storage_zfs_use_refquota", "storage_lvm_mount_options", "network", "profile_usedby", "container_push", "container_exec_recording", "certificate_update", "container_exec_signal_handling", "gpu_devices", "container_image_properties", "migration_progress", "id_map", "network_firewall_filtering", "network_routes", "storage", "file_delete", "file_append", "network_dhcp_expiry", "storage_lvm_vg_rename", "storage_lvm_thinpool_rename", "network_vlan", "image_create_aliases", "container_stateless_copy", "container_only_migration", "storage_zfs_clone_copy", "unix_device_rename", "storage_lvm_use_thinpool", "storage_rsync_bwlimit", "network_vxlan_interface", "storage_btrfs_mount_options", "entity_description", "image_force_refresh", "storage_lvm_lv_resizing", "id_map_base", "file_symlinks", "container_push_target", "network_vlan_physical", "storage_images_delete", "container_edit_metadata", "container_snapshot_stateful_migration", "storage_driver_ceph", "storage_ceph_user_name", "resource_limits", "storage_volatile_initial_source", "storage_ceph_force_osd_reuse", "storage_block_filesystem_btrfs", "resources", "kernel_limits", "storage_api_volume_rename", "network_sriov", "console", "restrict_devlxd", "migration_pre_copy", "infiniband", "maas_network", "devlxd_events", "proxy", "network_dhcp_gateway", "file_get_symlink", "network_leases", "unix_device_hotplug", "storage_api_local_volume_handling", "operation_description", "clustering", "event_lifecycle", "storage_api_remote_volume_handling", "nvidia_runtime", "container_mount_propagation", "container_backup", "devlxd_images", "container_local_cross_pool_handling", "proxy_unix", "proxy_udp", "clustering_join", "proxy_tcp_udp_multi_port_handling", "network_state", "proxy_unix_dac_properties", "container_protection_delete", "unix_priv_drop", "pprof_http", "proxy_haproxy_protocol", "network_hwaddr", "proxy_nat", "network_nat_order", "container_full", "backup_compression", "nvidia_runtime_config", "storage_api_volume_snapshots", "storage_unmapped", "projects", "network_vxlan_ttl", "container_incremental_copy", "usb_optional_vendorid", "snapshot_scheduling", "snapshot_schedule_aliases", "container_copy_project", "clustering_server_address", "clustering_image_replication", "container_protection_shift", "snapshot_expiry", "container_backup_override_pool", "snapshot_expiry_creation", "network_leases_location", "resources_cpu_socket", "resources_gpu", "resources_numa", "kernel_features", "id_map_current", "event_location", "storage_api_remote_volume_snapshots", "network_nat_address", "container_nic_routes", "cluster_internal_copy", "seccomp_notify", "lxc_features", "container_nic_ipvlan", "network_vlan_sriov", "storage_cephfs", "container_nic_ipfilter", "resources_v2", "container_exec_user_group_cwd", "container_syscall_intercept", "container_disk_shift", "storage_shifted", "resources_infiniband", "daemon_storage", "instances", "image_types", "resources_disk_sata", "clustering_roles", "images_expiry", "resources_network_firmware", "backup_compression_algorithm", "ceph_data_pool_name", "container_syscall_intercept_mount", "compression_squashfs", "container_raw_mount", "container_nic_routed", "container_syscall_intercept_mount_fuse", "container_disk_ceph", "virtual-machines", "image_profiles", "clustering_architecture", "resources_disk_id", "storage_lvm_stripes", "vm_boot_priority", "unix_hotplug_devices", "api_filtering", "instance_nic_network", "clustering_sizing", "firewall_driver", "projects_limits", "container_syscall_intercept_hugetlbfs", "limits_hugepages", "container_nic_routed_gateway", "projects_restrictions", "custom_volume_snapshot_expiry", "volume_snapshot_scheduling", "trust_ca_certificates", "snapshot_disk_usage", "clustering_edit_roles", "container_nic_routed_host_address", "container_nic_ipvlan_gateway", "resources_usb_pci", "resources_cpu_threads_numa", "resources_cpu_core_die", "api_os", "container_nic_routed_host_table", "container_nic_ipvlan_host_table", "container_nic_ipvlan_mode", "resources_system", "images_push_relay", "network_dns_search", "container_nic_routed_limits", "instance_nic_bridged_vlan", "network_state_bond_bridge", "usedby_consistency", "custom_block_volumes", "clustering_failure_domains", "resources_gpu_mdev", "console_vga_type", "projects_limits_disk", "network_type_macvlan", "network_type_sriov", "container_syscall_intercept_bpf_devices", "network_type_ovn", "projects_networks", "projects_networks_restricted_uplinks", "custom_volume_backup", "backup_override_name", "storage_rsync_compression", "network_type_physical", "network_ovn_external_subnets", "network_ovn_nat", "network_ovn_external_routes_remove", "tpm_device_type", "storage_zfs_clone_copy_rebase", "gpu_mdev", "resources_pci_iommu", "resources_network_usb", "resources_disk_address", "network_physical_ovn_ingress_mode", "network_ovn_dhcp", "network_physical_routes_anycast", "projects_limits_instances", "network_state_vlan", "instance_nic_bridged_port_isolation", "instance_bulk_state_change", "network_gvrp", "instance_pool_move", "gpu_sriov", "pci_device_type", "storage_volume_state", "network_acl", "migration_stateful", "disk_state_quota", "storage_ceph_features", "projects_compression", "projects_images_remote_cache_expiry", "certificate_project", "network_ovn_acl", "projects_images_auto_update", "projects_restricted_cluster_target", "images_default_architecture", "network_ovn_acl_defaults", "gpu_mig", "project_usage", "network_bridge_acl", "warnings", "projects_restricted_backups_and_snapshots", "clustering_join_token", "clustering_description", "server_trusted_proxy", "clustering_update_cert", "storage_api_project", "server_instance_driver_operational", "server_supported_storage_drivers", "event_lifecycle_requestor_address", "resources_gpu_usb", "clustering_evacuation", "network_ovn_nat_address", "network_bgp", "network_forward", "custom_volume_refresh", "network_counters_errors_dropped", "metrics", "image_source_project", "clustering_config", "network_peer", "linux_sysctl", "network_dns", "ovn_nic_acceleration", "certificate_self_renewal", "instance_project_move", "storage_volume_project_move", "cloud_init", "network_dns_nat", "database_leader", "instance_all_projects", "clustering_groups", "ceph_rbd_du", "instance_get_full", "qemu_metrics", "gpu_mig_uuid", "event_project", "clustering_evacuation_live", "instance_allow_inconsistent_copy", "network_state_ovn", "storage_volume_api_filtering", "image_restrictions", "storage_zfs_export", "network_dns_records", "storage_zfs_reserve_space", "network_acl_log", "storage_zfs_blocksize", "metrics_cpu_seconds", "instance_snapshot_never", "certificate_token", "instance_nic_routed_neighbor_probe", "event_hub", "agent_nic_config", "projects_restricted_intercept", "metrics_authentication", "images_target_project", "cluster_migration_inconsistent_copy", "cluster_ovn_chassis", "container_syscall_intercept_sched_setscheduler", "storage_lvm_thinpool_metadata_size", "storage_volume_state_total", "instance_file_head", "instances_nic_host_name", "image_copy_profile", "container_syscall_intercept_sysinfo", "clustering_evacuation_mode", "resources_pci_vpd", "qemu_raw_conf", "storage_cephfs_fscache", "network_load_balancer", "vsock_api", "instance_ready_state", "network_bgp_holdtime", "storage_volumes_all_projects", "metrics_memory_oom_total", "storage_buckets", "storage_buckets_create_credentials", "metrics_cpu_effective_total", "projects_networks_restricted_access", "storage_buckets_local", "loki", "acme", "internal_metrics", "cluster_join_token_expiry", "remote_token_expiry", "init_preseed", "storage_volumes_created_at", "cpu_hotplug", "projects_networks_zones", "network_txqueuelen", "cluster_member_state", "instances_placement_scriptlet", "storage_pool_source_wipe", "zfs_block_mode", "instance_generation_id", "disk_io_cache", "amd_sev", "storage_pool_loop_resize", "migration_vm_live", "ovn_nic_nesting", "oidc", "network_ovn_l3only", "ovn_nic_acceleration_vdpa", "cluster_healing", "instances_state_total", "auth_user", "security_csm", "instances_rebuild", "numa_cpu_placement", "custom_volume_iso", "network_allocations", "storage_api_remote_volume_snapshot_copy", "zfs_delegate", "operations_get_query_all_projects", "metadata_configuration", "syslog_socket", "event_lifecycle_name_and_project", "instances_nic_limits_priority", "disk_initial_volume_configuration", "operation_wait", "cluster_internal_custom_volume_copy", "disk_io_bus", "storage_cephfs_create_missing", "instance_move_config", "ovn_ssl_config", "init_preseed_storage_volumes", "metrics_instances_count", "server_instance_type_info", "resources_disk_mounted", "server_version_lts", "oidc_groups_claim", "loki_config_instance", "storage_volatile_uuid", "import_instance_devices", "instances_uefi_vars", "instances_migration_stateful", "container_syscall_filtering_allow_deny_syntax", "access_management", "vm_disk_io_limits", "storage_volumes_all", "instances_files_modify_permissions", "image_restriction_nesting", "container_syscall_intercept_finit_module", "device_usb_serial", "network_allocate_external_ips", "explicit_trust_token" ], "api_status": "stable", "api_version": "1.0", "auth": "trusted", "public": false, "auth_methods": [ "tls" ], "auth_user_name": "zadmin", "auth_user_method": "unix", "environment": { "addresses": [], "architectures": [ "x86_64", "i686" ], "certificate": "-----BEGIN CERTIFICATE-----\nMIIB4DCCAWagAwIBAgIQGKf6QO+LRBtazcRgzGf73jAKBggqhkjOPQQDAzAjMQww\nCgYDVQQKEwNMWEQxEzARBgNVBAMMCnJvb3RAb3Jpb24wHhcNMjQwNjA0MTcxMzM5\nWhcNMzQwNjAyMTcxMzM5WjAjMQwwCgYDVQQKEwNMWEQxEzARBgNVBAMMCnJvb3RA\nb3Jpb24wdjAQBgcqhkjOPQIBBgUrgQQAIgNiAARAf3DmUWEXKVThckWCogYH+h9L\n93QidA1lxvngrMZ1nLyzkZ8FC0e5jWxpoRV/EQE0tQJzvBXo+3R3F7io8Xw/ATAO\n8MfNeMx1dD1ZHXB4DQadWKuCX+dhIuGvbiHecvijXzBdMA4GA1UdDwEB/wQEAwIF\noDATBgNVHSUEDDAKBggrBgEFBQcDATAMBgNVHRMBAf8EAjAAMCgGA1UdEQQhMB+C\nBW9yaW9uhwR/AAABhxAAAAAAAAAAAAAAAAAAAAABMAoGCCqGSM49BAMDA2gAMGUC\nMQDiVn1eTwkiSFWYYmFrTRM45dzIyF5O66mce2tDeibpRlEB7H3BowYKwlrC7o7N\niOMCMAaBu+GtgV5QagCnK+OzockZ6JZR+iOyKTGmdovUVBgKau3b7PDt/d6SrYlq\nr8HALg==\n-----END CERTIFICATE-----\n", "certificate_fingerprint": "b377aa0a3b291c1ca82b1dfcfdf1164786ddaecdea42e1963e51c53cf8a3d2d5", "driver": "lxc | qemu", "driver_version": "6.0.0 | 8.2.1", "instance_types": [ "container", "virtual-machine" ], "firewall": "nftables", "kernel": "Linux", "kernel_architecture": "x86_64", "kernel_features": { "idmapped_mounts": "true", "netnsid_getifaddrs": "true", "seccomp_listener": "true", "seccomp_listener_continue": "true", "uevent_injection": "true", "unpriv_fscaps": "true" }, "kernel_version": "5.15.0-119-generic", "lxc_features": { "cgroup2": "true", "core_scheduling": "true", "devpts_fd": "true", "idmapped_mounts_v2": "true", "mount_injection_file": "true", "network_gateway_device_route": "true", "network_ipvlan": "true", "network_l2proxy": "true", "network_phys_macvlan_mtu": "true", "network_veth_router": "true", "pidfd": "true", "seccomp_allow_deny_syntax": "true", "seccomp_notify": "true", "seccomp_proxy_send_notify_fd": "true" }, "os_name": "Ubuntu", "os_version": "22.04", "project": "default", "server": "lxd", "server_clustered": false, "server_event_mode": "full-mesh", "server_name": "orion", "server_pid": 22856, "server_version": "6.1", "server_lts": false, "storage": "lvm | dir", "storage_version": "2.03.11(2) (2021-01-08) / 1.02.175 (2021-01-08) / 4.45.0 | 1", "storage_supported_drivers": [ { "Name": "powerflex", "Version": "1.16 (nvme-cli)", "Remote": true }, { "Name": "zfs", "Version": "2.1.5-1ubuntu6~22.04.4", "Remote": false }, { "Name": "btrfs", "Version": "5.16.2", "Remote": false }, { "Name": "ceph", "Version": "17.2.7", "Remote": true }, { "Name": "cephfs", "Version": "17.2.7", "Remote": true }, { "Name": "cephobject", "Version": "17.2.7", "Remote": true }, { "Name": "dir", "Version": "1", "Remote": false }, { "Name": "lvm", "Version": "2.03.11(2) (2021-01-08) / 1.02.175 (2021-01-08) / 4.45.0", "Remote": false } ] } } DEBUG [2024-08-23T17:35:41Z] Sending request to LXD etag= method=GET url="http://unix.socket/1.0/instances/test" DEBUG [2024-08-23T17:35:41Z] Got response struct from LXD
    DEBUG [2024-08-23T17:35:41Z] { "name": "test", "description": "", "status": "Stopped", "status_code": 102, "created_at": "2024-08-22T16:54:11.202074755Z", "last_used_at": "1970-01-01T00:00:00Z", "location": "none", "type": "container", "project": "default", "architecture": "x86_64", "ephemeral": false, "stateful": false, "profiles": [ "default" ], "config": { "image.architecture": "amd64", "image.description": "ubuntu 22.04 LTS amd64 (release) (20240821)", "image.label": "release", "image.os": "ubuntu", "image.release": "jammy", "image.serial": "20240821", "image.type": "squashfs", "image.version": "22.04", "volatile.apply_template": "create", "volatile.base_image": "a3a8118143289e285ec44b489fb1a0811da75c27a22004f7cd34db70a60a0af4", "volatile.cloud-init.instance-id": "9fc7d050-64d2-4bec-98e2-3cc40f98cf9c", "volatile.eth0.hwaddr": "00:16:3e:5d:61:55", "volatile.idmap.base": "0", "volatile.idmap.next": "[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]", "volatile.last_state.idmap": "[]", "volatile.uuid": "2b534c2d-4d92-422f-a38d-2a308c5cd10c", "volatile.uuid.generation": "2b534c2d-4d92-422f-a38d-2a308c5cd10c" }, "devices": {}, "expanded_config": { "image.architecture": "amd64", "image.description": "ubuntu 22.04 LTS amd64 (release) (20240821)", "image.label": "release", "image.os": "ubuntu", "image.release": "jammy", "image.serial": "20240821", "image.type": "squashfs", "image.version": "22.04", "volatile.apply_template": "create", "volatile.base_image": "a3a8118143289e285ec44b489fb1a0811da75c27a22004f7cd34db70a60a0af4", "volatile.cloud-init.instance-id": "9fc7d050-64d2-4bec-98e2-3cc40f98cf9c", "volatile.eth0.hwaddr": "00:16:3e:5d:61:55", "volatile.idmap.base": "0", "volatile.idmap.next": "[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]", "volatile.last_state.idmap": "[]", "volatile.uuid": "2b534c2d-4d92-422f-a38d-2a308c5cd10c", "volatile.uuid.generation": "2b534c2d-4d92-422f-a38d-2a308c5cd10c" }, "expanded_devices": { "eth0": { "name": "eth0", "network": "lxdbr0", "type": "nic" }, "root": { "path": "/", "pool": "local", "type": "disk" } } } DEBUG [2024-08-23T17:35:41Z] Connected to the websocket: ws://unix.socket/1.0/events DEBUG [2024-08-23T17:35:41Z] Sending request to LXD etag= method=PUT url="http://unix.socket/1.0/instances/test/state"

  • [ ] Output of the daemon with --debug (alternatively output of lxc monitor while reproducing the issue)

I can provide additional info as needed.

declay avatar Aug 23 '24 17:08 declay

Please can you get contents of /var/snap/lxd/common/lxd/logs/lxd.log

tomponline avatar Aug 24 '24 10:08 tomponline

Please can you get contents of /var/snap/lxd/common/lxd/logs/lxd.log

time="2024-08-24T01:22:59Z" level=warning msg=" - Couldn't find the CGroup network priority controller, per-instance network priority will be ignored. Please use per-device limits.priority instead"

declay avatar Aug 25 '24 14:08 declay

Interestingly enough, I checked on it this morning and LXD finaly started after more than a day. I'm sure if I reboot it or restart th e services it will likely hang again. Here's the latest ouput:

udo systemctl status snap.lxd.daemon.service ● snap.lxd.daemon.service - Service for snap application lxd.daemon Loaded: loaded (/etc/systemd/system/snap.lxd.daemon.service; static) Active: active (running) since Sat 2024-08-24 01:22:58 UTC; 1 day 12h ago TriggeredBy: ● snap.lxd.daemon.unix.socket Main PID: 125733 (daemon.start) Tasks: 0 (limit: 38283) Memory: 952.0K CPU: 231ms CGroup: /system.slice/snap.lxd.daemon.service ‣ 125733 /bin/sh /snap/lxd/29551/commands/daemon.start

Aug 24 01:22:58 orion lxd.daemon[125860]: - proc_uptime Aug 24 01:22:58 orion lxd.daemon[125860]: - proc_slabinfo Aug 24 01:22:58 orion lxd.daemon[125860]: - shared_pidns Aug 24 01:22:58 orion lxd.daemon[125860]: - cpuview_daemon Aug 24 01:22:58 orion lxd.daemon[125860]: - loadavg_daemon Aug 24 01:22:58 orion lxd.daemon[125860]: - pidfds Aug 24 01:22:59 orion lxd.daemon[125733]: => Killing conflicting LXD (pid=125414) Aug 24 01:22:59 orion lxd.daemon[125733]: => Starting LXD Aug 24 01:22:59 orion lxd.daemon[125872]: time="2024-08-24T01:22:59Z" level=warning msg=" - Couldn't find the CGroup network priority controller, per-instance network priorit> Aug 25 03:36:36 orion lxd.daemon[125733]: => LXD is ready

declay avatar Aug 25 '24 14:08 declay

Is this still an issue?

Support requests should generally go to the support forum here: https://discourse.ubuntu.com/c/lxd/126

tomponline avatar Sep 16 '24 09:09 tomponline

I'll close the issue since the last snap update resulted in it coming up cleanly without the hang.

declay avatar Sep 16 '24 14:09 declay