lxc failing with `Error: mkdir /var/snap/lxd/common/lxd/shmounts: file exists` when using snap in parallel mode
Required information
- Distribution: Ubuntu
- Distribution version: 22.04.2
- The output of "lxc info": (I can't help but feel this is no longer the nice helpful summary of what is running):
$ lxc info
config:
core.https_address: '[::]'
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- rbac
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- clustering_evacuation
- network_ovn_nat_address
- network_bgp
- network_forward
- custom_volume_refresh
- network_counters_errors_dropped
- metrics
- image_source_project
- clustering_config
- network_peer
- linux_sysctl
- network_dns
- ovn_nic_acceleration
- certificate_self_renewal
- instance_project_move
- storage_volume_project_move
- cloud_init
- network_dns_nat
- database_leader
- instance_all_projects
- clustering_groups
- ceph_rbd_du
- instance_get_full
- qemu_metrics
- gpu_mig_uuid
- event_project
- clustering_evacuation_live
- instance_allow_inconsistent_copy
- network_state_ovn
- storage_volume_api_filtering
- image_restrictions
- storage_zfs_export
- network_dns_records
- storage_zfs_reserve_space
- network_acl_log
- storage_zfs_blocksize
- metrics_cpu_seconds
- instance_snapshot_never
- certificate_token
- instance_nic_routed_neighbor_probe
- event_hub
- agent_nic_config
- projects_restricted_intercept
- metrics_authentication
- images_target_project
- cluster_migration_inconsistent_copy
- cluster_ovn_chassis
- container_syscall_intercept_sched_setscheduler
- storage_lvm_thinpool_metadata_size
- storage_volume_state_total
- instance_file_head
- resources_pci_vpd
- qemu_raw_conf
- storage_cephfs_fscache
- vsock_api
- storage_volumes_all_projects
- projects_networks_restricted_access
- cluster_join_token_expiry
- remote_token_expiry
- init_preseed
- cpu_hotplug
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
addresses:
- 172.29.20.18:8443
- 10.25.164.1:8443
architectures:
- x86_64
- i686
certificate: |
<elided>
certificate_fingerprint: 6c1bd7d6ac16fc3623b03a1a2a7f95f35ea204a471e2778d99c8e1c4b95b3fb5
driver: lxc
driver_version: 5.0.2
firewall: nftables
kernel: Linux
kernel_architecture: x86_64
kernel_features:
idmapped_mounts: "true"
netnsid_getifaddrs: "true"
seccomp_listener: "true"
seccomp_listener_continue: "true"
shiftfs: "false"
uevent_injection: "true"
unpriv_fscaps: "true"
kernel_version: 5.15.0-71-generic
lxc_features:
cgroup2: "true"
core_scheduling: "true"
devpts_fd: "true"
idmapped_mounts_v2: "true"
mount_injection_file: "true"
network_gateway_device_route: "true"
network_ipvlan: "true"
network_l2proxy: "true"
network_phys_macvlan_mtu: "true"
network_veth_router: "true"
pidfd: "true"
seccomp_allow_deny_syntax: "true"
seccomp_notify: "true"
seccomp_proxy_send_notify_fd: "true"
os_name: Ubuntu
os_version: "22.04"
project: default
server: lxd
server_clustered: false
server_event_mode: full-mesh
server_name: jammy
server_pid: 1264
server_version: 5.0.2
storage: dir | zfs | btrfs
storage_version: 1 | 2.1.5-1ubuntu6~22.04.1 | 5.4.1
storage_supported_drivers:
- name: dir
version: "1"
remote: false
- name: lvm
version: 2.03.07(2) (2019-11-30) / 1.02.167 (2019-11-30) / 4.45.0
remote: false
- name: zfs
version: 2.1.5-1ubuntu6~22.04.1
remote: false
- name: btrfs
version: 5.4.1
remote: false
- name: ceph
version: 15.2.17
remote: true
- name: cephfs
version: 15.2.17
remote: true
- name: cephobject
version: 15.2.17
remote: true
Issue description
Lxc is failing to start containers. I first noticed this while trying to do a juju_29 bootstrap lxd lxd after doing a parallel install of the juju snap. However, the actual failure is happening with only lxc launch in the mix:
Steps to reproduce
- Try to launch an LXD container:
$ lxc launch juju/[email protected]/amd64 Creating the instance Instance name is: proven-mule Starting proven-mule Error: mkdir /var/snap/lxd/common/lxd/shmounts: file exists Try `lxc info --show-log local:proven-mule` for more info - Looking at the contents of the directory that path does exist:
$ ll /var/snap/lxd/common/lxd/shmounts
lrwxrwxrwx 1 root root 39 May 10 11:15 /var/snap/lxd/common/lxd/shmounts -> /var/snap/lxd/common/shmounts/instances
However, what it points to, does not:
$ sudo ls -al /var/snap/lxd/common/shmounts
total 8
drwx--x--x 2 root root 4096 Jan 20 16:00 .
drwxr-xr-x 9 root root 4096 May 10 11:15 ..
I can manually delete the symlink, or manually create the instances directory, but I'm not sure what perms should be used. I don't know whether Juju is somehow using an older lxd client library version that set something up incorrectly (but juju the snap shouldn't have any writes to write into those directories anyway, so I'm pretty sure it is LXD the agent who is setting those things up.)
Information to attach
There are only 2 lines in /var/snap/lxd/common/lxd/logs/lxd.log:
time="2023-05-10T11:15:35-04:00" level=warning msg=" - Couldn't find the CGroup network priority controller, network priority will be ignored"
time="2023-05-10T11:15:35-04:00" level=warning msg="Instance type not operational" driver=qemu err="KVM support is missing (no /dev/kvm)" type=virtual-machine
Does this occur on all (or a fresh) system? Or just this particular machine?
Moved over to the snap packaging repo
It happened for 2 relatively fresh systems in my testing.
It seems that the factor is installing LXD, and then enabling parallel installs (https://snapcraft.io/docs/parallel-installs), and then trying to launch a container. Vitaly should have a bit more information here.
I've also run into this one.
For everyone that does run into this, I can't get parallel-instances to work correctly for now. So disabling is the only option I had. Then I had to restart lxd to get it working again.
$ sudo snap set system experimental.parallel-instances=false
$ sudo snap restart lxd
I can't seem to reproduce any problem with launching the containers. I've set up a VMs with 22.04 and 24.04, lxd 5.0.3 and 5.21 respectively. Parallel instances enabled, installed test-snapd-sh-core24 and test-snapd-sh-core24_foo in both VMs and launched both to have the proper mounts set up. Then i launched a couple of containers, launched containers within the containers, removed them, no issues. There's a chance this may have been fixed by https://github.com/canonical/lxd-pkg-snap/pull/375 and https://github.com/canonical/lxd-pkg-snap/pull/379
@jameinel @bboozzoo happy to close this one?
SGTM, if @jameinel agrees then let's close it. If the problem shows up again, feel free to file a bug for snapd. to investigate and we can take it from there.
For me the same issue as described in the original issue description is still happening after enabling parallel installs.
LXD 5.21.1 LTS, Ubuntu 24.04, snap 2.63+24.04