incus icon indicating copy to clipboard operation
incus copied to clipboard

Cannot provision multiple instances in parallel using linstor driver

Open serturx opened this issue 5 months ago • 21 comments

Is there an existing issue for this?

  • [x] There is no existing issue for this bug

Is this happening on an up to date version of Incus?

  • [x] This is happening on a supported version of Incus

Incus system details

config:
  cluster.healing_threshold: "60"
  cluster.https_address: 10.100.10.10:8443
  core.https_address: 10.100.10.10:8443
  network.ovn.ca_cert: |-
    -----BEGIN CERTIFICATE-----
    ...
    -----END CERTIFICATE-----
  network.ovn.client_cert: |-
    -----BEGIN CERTIFICATE-----
    ...
    -----END CERTIFICATE-----
  network.ovn.client_key: |-
    -----BEGIN EC PRIVATE KEY-----
    ...
    -----END EC PRIVATE KEY-----
  network.ovn.northbound_connection: ssl:10.100.10.10:6641,ssl:10.100.10.20:6641,ssl:10.100.10.30:6641
  storage.linstor.ca_cert: |-
    -----BEGIN CERTIFICATE-----
    ...
    -----END CERTIFICATE-----
  storage.linstor.client_cert: |-
    -----BEGIN CERTIFICATE-----
    ...
    -----END CERTIFICATE-----
  storage.linstor.client_key: |-
    -----BEGIN EC PRIVATE KEY-----
    ...
    -----END EC PRIVATE KEY-----
  storage.linstor.controller_connection: https://10.100.10.10:3371,https://10.100.10.20:3371,https://10.100.10.30:3371
  storage.linstor.satellite.name: server1
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- network_sriov
- console
- restrict_dev_incus
- migration_pre_copy
- infiniband
- dev_incus_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- dev_incus_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- backup_compression
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- clustering_evacuation
- network_ovn_nat_address
- network_bgp
- network_forward
- custom_volume_refresh
- network_counters_errors_dropped
- metrics
- image_source_project
- clustering_config
- network_peer
- linux_sysctl
- network_dns
- ovn_nic_acceleration
- certificate_self_renewal
- instance_project_move
- storage_volume_project_move
- cloud_init
- network_dns_nat
- database_leader
- instance_all_projects
- clustering_groups
- ceph_rbd_du
- instance_get_full
- qemu_metrics
- gpu_mig_uuid
- event_project
- clustering_evacuation_live
- instance_allow_inconsistent_copy
- network_state_ovn
- storage_volume_api_filtering
- image_restrictions
- storage_zfs_export
- network_dns_records
- storage_zfs_reserve_space
- network_acl_log
- storage_zfs_blocksize
- metrics_cpu_seconds
- instance_snapshot_never
- certificate_token
- instance_nic_routed_neighbor_probe
- event_hub
- agent_nic_config
- projects_restricted_intercept
- metrics_authentication
- images_target_project
- images_all_projects
- cluster_migration_inconsistent_copy
- cluster_ovn_chassis
- container_syscall_intercept_sched_setscheduler
- storage_lvm_thinpool_metadata_size
- storage_volume_state_total
- instance_file_head
- instances_nic_host_name
- image_copy_profile
- container_syscall_intercept_sysinfo
- clustering_evacuation_mode
- resources_pci_vpd
- qemu_raw_conf
- storage_cephfs_fscache
- network_load_balancer
- vsock_api
- instance_ready_state
- network_bgp_holdtime
- storage_volumes_all_projects
- metrics_memory_oom_total
- storage_buckets
- storage_buckets_create_credentials
- metrics_cpu_effective_total
- projects_networks_restricted_access
- storage_buckets_local
- loki
- acme
- internal_metrics
- cluster_join_token_expiry
- remote_token_expiry
- init_preseed
- storage_volumes_created_at
- cpu_hotplug
- projects_networks_zones
- network_txqueuelen
- cluster_member_state
- instances_placement_scriptlet
- storage_pool_source_wipe
- zfs_block_mode
- instance_generation_id
- disk_io_cache
- amd_sev
- storage_pool_loop_resize
- migration_vm_live
- ovn_nic_nesting
- oidc
- network_ovn_l3only
- ovn_nic_acceleration_vdpa
- cluster_healing
- instances_state_total
- auth_user
- security_csm
- instances_rebuild
- numa_cpu_placement
- custom_volume_iso
- network_allocations
- zfs_delegate
- storage_api_remote_volume_snapshot_copy
- operations_get_query_all_projects
- metadata_configuration
- syslog_socket
- event_lifecycle_name_and_project
- instances_nic_limits_priority
- disk_initial_volume_configuration
- operation_wait
- image_restriction_privileged
- cluster_internal_custom_volume_copy
- disk_io_bus
- storage_cephfs_create_missing
- instance_move_config
- ovn_ssl_config
- certificate_description
- disk_io_bus_virtio_blk
- loki_config_instance
- instance_create_start
- clustering_evacuation_stop_options
- boot_host_shutdown_action
- agent_config_drive
- network_state_ovn_lr
- image_template_permissions
- storage_bucket_backup
- storage_lvm_cluster
- shared_custom_block_volumes
- auth_tls_jwt
- oidc_claim
- device_usb_serial
- numa_cpu_balanced
- image_restriction_nesting
- network_integrations
- instance_memory_swap_bytes
- network_bridge_external_create
- network_zones_all_projects
- storage_zfs_vdev
- container_migration_stateful
- profiles_all_projects
- instances_scriptlet_get_instances
- instances_scriptlet_get_cluster_members
- instances_scriptlet_get_project
- network_acl_stateless
- instance_state_started_at
- networks_all_projects
- network_acls_all_projects
- storage_buckets_all_projects
- resources_load
- instance_access
- project_access
- projects_force_delete
- resources_cpu_flags
- disk_io_bus_cache_filesystem
- instance_oci
- clustering_groups_config
- instances_lxcfs_per_instance
- clustering_groups_vm_cpu_definition
- disk_volume_subpath
- projects_limits_disk_pool
- network_ovn_isolated
- qemu_raw_qmp
- network_load_balancer_health_check
- oidc_scopes
- network_integrations_peer_name
- qemu_scriptlet
- instance_auto_restart
- storage_lvm_metadatasize
- ovn_nic_promiscuous
- ovn_nic_ip_address_none
- instances_state_os_info
- network_load_balancer_state
- instance_nic_macvlan_mode
- storage_lvm_cluster_create
- network_ovn_external_interfaces
- instances_scriptlet_get_instances_count
- cluster_rebalance
- custom_volume_refresh_exclude_older_snapshots
- storage_initial_owner
- storage_live_migration
- instance_console_screenshot
- image_import_alias
- authorization_scriptlet
- console_force
- network_ovn_state_addresses
- network_bridge_acl_devices
- instance_debug_memory
- init_preseed_storage_volumes
- init_preseed_profile_project
- instance_nic_routed_host_address
- instance_smbios11
- api_filtering_extended
- acme_dns01
- security_iommu
- network_ipv4_dhcp_routes
- network_state_ovn_ls
- network_dns_nameservers
- acme_http01_port
- network_ovn_ipv4_dhcp_expiry
- instance_state_cpu_time
- network_io_bus
- disk_io_bus_usb
- storage_driver_linstor
- instance_oci_entrypoint
- network_address_set
- server_logging
- network_forward_snat
- memory_hotplug
- instance_nic_routed_host_tables
- instance_publish_split
- init_preseed_certificates
- custom_volume_sftp
- network_ovn_external_nic_address
- network_physical_gateway_hwaddr
- backup_s3_upload
- snapshot_manual_expiry
- resources_cpu_address_sizes
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
auth_user_name: root
auth_user_method: unix
environment:
  addresses:
  - 10.100.10.10:8443
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    ...
    -----END CERTIFICATE-----
  certificate_fingerprint: ddd57c54a033efa824b57a19120044218ab4242d6d4e48b841b9f6bf3244d048
  driver: lxc | qemu
  driver_version: 6.0.4 | 9.0.4
  firewall: nftables
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    idmapped_mounts: "true"
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    seccomp_listener_continue: "true"
    uevent_injection: "true"
    unpriv_binfmt: "true"
    unpriv_fscaps: "true"
  kernel_version: 6.8.0-64-generic
  lxc_features:
    cgroup2: "true"
    core_scheduling: "true"
    devpts_fd: "true"
    idmapped_mounts_v2: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_allow_deny_syntax: "true"
    seccomp_notify: "true"
    seccomp_proxy_send_notify_fd: "true"
  os_name: Ubuntu
  os_version: "24.04"
  project: default
  server: incus
  server_clustered: true
  server_event_mode: full-mesh
  server_name: server1-cluster-node
  server_pid: 24361
  server_version: "6.14"
  storage: ""
  storage_version: ""
  storage_supported_drivers:
  - name: dir
    version: "1"
    remote: false
  - name: lvm
    version: 2.03.16(2) (2022-05-18) / 1.02.185 (2022-05-18) / 4.48.0
    remote: false
  - name: lvmcluster
    version: 2.03.16(2) (2022-05-18) / 1.02.185 (2022-05-18) / 4.48.0
    remote: true
  - name: linstor
    version: 1.31.3 / 9.2.14
    remote: true

Instance details

No response

Instance log

No response

Current behavior

I've encountered an issue when using the terraform provider to provision multiple (>=2) instances at once using linstor pool storage, the instance creation will fail for some instances with the following error: Error: Failed instance creation: Failed creating instance from image: Multiple resource definitions found for volume f2a07c489b4cc3c97cd73c4e325524dfebe0669eb882967d0ed6ed6cafe91582
f2a07c489b4c in this case refers to the default Ubuntu/24.04 container image as seen in incus image ls. It does seem random how many instances will actually be created successfully when creating multiple.

I was able to reproduce the same behaviour with the Incus CLI, so this bug is not only apparent when using the terraform provider. Waiting a small amount of time (seconds range) between calls to create an instance does seem to resolve the issue.

The workaround for terraform in this case is to run it with -parallelism=1, which is much more time-intensive as it creates each instance sequentially.

Incus cluster:

root@server1:~# incus cluster ls
+----------------------+---------------------------+-----------------+--------------+----------------+-------------+--------+-------------------+
|         NAME         |            URL            |      ROLES      | ARCHITECTURE | FAILURE DOMAIN | DESCRIPTION | STATUS |      MESSAGE      |
+----------------------+---------------------------+-----------------+--------------+----------------+-------------+--------+-------------------+
| server1-cluster-node | https://10.100.10.10:8443 | database-leader | x86_64       | default        |             | ONLINE | Fully operational |
|                      |                           | database        |              |                |             |        |                   |
+----------------------+---------------------------+-----------------+--------------+----------------+-------------+--------+-------------------+
| server2-cluster-node | https://10.100.10.20:8443 | database        | x86_64       | default        |             | ONLINE | Fully operational |
+----------------------+---------------------------+-----------------+--------------+----------------+-------------+--------+-------------------+
| server3-cluster-node | https://10.100.10.30:8443 | database        | x86_64       | default        |             | ONLINE | Fully operational |
+----------------------+---------------------------+-----------------+--------------+----------------+-------------+--------+-------------------+

Incus linstor storage config:

root@server1:~# incus storage show incus-linstor-pool
config:
  drbd.auto_add_quorum_tiebreaker: "true"
  drbd.on_no_quorum: suspend-io
  linstor.resource_group.name: incus-linstor-pool
  linstor.resource_group.place_count: "2"
  linstor.resource_group.storage_pool: thinpool
  linstor.volume.prefix: incus-volume-
  volatile.pool.pristine: "true"
description: ""
name: incus-linstor-pool
driver: linstor
used_by:
- /1.0/images/f2a07c489b4cc3c97cd73c4e325524dfebe0669eb882967d0ed6ed6cafe91582
- /1.0/images/f2a07c489b4cc3c97cd73c4e325524dfebe0669eb882967d0ed6ed6cafe91582?project=monitoring
status: Created
locations:
- server3-cluster-node
- server1-cluster-node
- server2-cluster-node

Linstor cluster:

root@server1:~# linstor n l
╭───────────────────────────────────────────────────────╮
┊ Node    ┊ NodeType ┊ Addresses               ┊ State  ┊
╞═══════════════════════════════════════════════════════╡
┊ server1 ┊ COMBINED ┊ 10.100.10.10:3367 (SSL) ┊ Online ┊
┊ server2 ┊ COMBINED ┊ 10.100.10.20:3367 (SSL) ┊ Online ┊
┊ server3 ┊ COMBINED ┊ 10.100.10.30:3367 (SSL) ┊ Online ┊
╰───────────────────────────────────────────────────────╯

Linstor storage pool config:

root@server1:~# linstor sp l
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ StoragePool          ┊ Node    ┊ Driver   ┊ PoolName            ┊ FreeCapacity ┊ TotalCapacity ┊ CanSnapshots ┊ State ┊ SharedName                   ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ DfltDisklessStorPool ┊ server1 ┊ DISKLESS ┊                     ┊              ┊               ┊ False        ┊ Ok    ┊ server1;DfltDisklessStorPool ┊
┊ DfltDisklessStorPool ┊ server2 ┊ DISKLESS ┊                     ┊              ┊               ┊ False        ┊ Ok    ┊ server2;DfltDisklessStorPool ┊
┊ DfltDisklessStorPool ┊ server3 ┊ DISKLESS ┊                     ┊              ┊               ┊ False        ┊ Ok    ┊ server3;DfltDisklessStorPool ┊
┊ thinpool             ┊ server1 ┊ LVM_THIN ┊ vg-linstor/thinpool ┊    47.30 GiB ┊     49.89 GiB ┊ True         ┊ Ok    ┊ server1;thinpool             ┊
┊ thinpool             ┊ server2 ┊ LVM_THIN ┊ vg-linstor/thinpool ┊    48.37 GiB ┊     49.89 GiB ┊ True         ┊ Ok    ┊ server2;thinpool             ┊
┊ thinpool             ┊ server3 ┊ LVM_THIN ┊ vg-linstor/thinpool ┊    48.76 GiB ┊     49.89 GiB ┊ True         ┊ Ok    ┊ server3;thinpool             ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
root@server1:~# linstor rd l
╭──────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName                                  ┊ Port ┊ ResourceGroup      ┊ Layers       ┊ State ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ incus-volume-23f1ed0b97a746b3be5541de3d7b8d8a ┊ 7004 ┊ incus-linstor-pool ┊ DRBD,STORAGE ┊ ok    ┊
┊ incus-volume-95698d3b52344c5984ae4ce855592a87 ┊ 7002 ┊ incus-linstor-pool ┊ DRBD,STORAGE ┊ ok    ┊
┊ incus-volume-a178cc6514a3418dae38eaf0cbe0b997 ┊ 7006 ┊ incus-linstor-pool ┊ DRBD,STORAGE ┊ ok    ┊
┊ incus-volume-c92f7f83078e431586222ee30fcab2e5 ┊ 7003 ┊ incus-linstor-pool ┊ DRBD,STORAGE ┊ ok    ┊
┊ incus-volume-ed14b0964e014761a6420add18ec65fc ┊ 7001 ┊ incus-linstor-pool ┊ DRBD,STORAGE ┊ ok    ┊
┊ linstor_db                                    ┊ 7000 ┊ linstor_db_grp     ┊ DRBD,STORAGE ┊ ok    ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯

Expected behavior

Instance provisioning works, regardless of how many instances are getting created in simultaneously using the linstor storage driver.

Steps to reproduce

I used a cluster of three nodes, each running incus and linstor with node-type = combined

  1. Setup Incus cluster
  2. Setup Linstor cluster
  3. Create Linstor storage pool with lvmthin backend
  4. Create an Incus storage pool with linstor storage driver
  5. Start multiple instances in parallel:
root@server2:~# parallel --group 'incus init images:ubuntu/24.04 test{} --storage incus-linstor-pool -d root,size=5GiB' ::: {110..120}Creating test117Error: Failed instance creation: Failed creating instance from image: Multiple resource definitions found for volume f2a07c489b4cc3c97cd73c4e325524dfebe0669eb882967d0ed6ed6cafe91582
Creating test116
Error: Failed instance creation: Failed creating instance from image: Multiple resource definitions found for volume f2a07c489b4cc3c97cd73c4e325524dfebe0669eb882967d0ed6ed6cafe91582
Creating test114
Error: Failed instance creation: Failed creating instance from image: Multiple resource definitions found for volume f2a07c489b4cc3c97cd73c4e325524dfebe0669eb882967d0ed6ed6cafe91582
Creating test111
Error: Failed instance creation: Failed creating instance from image: Multiple resource definitions found for volume f2a07c489b4cc3c97cd73c4e325524dfebe0669eb882967d0ed6ed6cafe91582
Creating test115
Error: Failed instance creation: Failed creating instance from image: Multiple resource definitions found for volume f2a07c489b4cc3c97cd73c4e325524dfebe0669eb882967d0ed6ed6cafe91582
Creating test113
Error: Failed instance creation: Failed creating instance from image: Multiple resource definitions found for volume f2a07c489b4cc3c97cd73c4e325524dfebe0669eb882967d0ed6ed6cafe91582
Creating test119
Error: Failed instance creation: Failed creating instance from image: Multiple resource definitions found for volume f2a07c489b4cc3c97cd73c4e325524dfebe0669eb882967d0ed6ed6cafe91582
Creating test120
Error: Failed instance creation: Failed creating instance from image: Multiple resource definitions found for volume f2a07c489b4cc3c97cd73c4e325524dfebe0669eb882967d0ed6ed6cafe91582
Creating test110

The instance you are starting doesn't have any network attached to it.
  To create a new network, use: incus network create
  To attach a network to an instance, use: incus network attach

Creating test112

The instance you are starting doesn't have any network attached to it.
  To create a new network, use: incus network create
  To attach a network to an instance, use: incus network attach

Creating test118

The instance you are starting doesn't have any network attached to it.
  To create a new network, use: incus network create
  To attach a network to an instance, use: incus network attach

serturx avatar Jul 27 '25 23:07 serturx

That's odd as EnsureImage specifically has logic to handle this, effectively using a lock to make sure that the image only gets unpacked once.

I wonder if shouldUseOptimizedImage is somehow returning false in your case, or if you're hitting a case where EnsureImage doesn't quite handle the situation given a cluster environment and those instances getting created on multiple server concurrently.

Can you retry with --target some-server passed so all instances get created on the same server? If that works correctly, then we're dealing with a cluster-specific issue, if not, then we're dealing with an issue with the Linstor driver and its handling of the EnsureImage logic.

stgraber avatar Jul 28 '25 02:07 stgraber

I’ll have a look sometime this week. Debug-verbosity logs could also help :)

bensmrs avatar Jul 28 '25 08:07 bensmrs

Can you retry with --target some-server passed so all instances get created on the same server? If that works correctly, then we're dealing with a cluster-specific issue, if not, then we're dealing with an issue with the Linstor driver and its handling of the EnsureImage logic.

I've tried it using the target arg, and that does seem to fix this issue:

root@server1:~# parallel --group 'incus init images:ubuntu/24.04 test{} --storage incus-linstor-pool -d root,size=5GiB --target server1-cluster-node' ::: {1..10}
Creating test1

The instance you are starting doesn't have any network attached to it.
  To create a new network, use: incus network create
  To attach a network to an instance, use: incus network attach

Creating test3

The instance you are starting doesn't have any network attached to it.
  To create a new network, use: incus network create
  To attach a network to an instance, use: incus network attach

...

Not sure if this is relevant, but I've also noticed increasing the amount of instances created in parallel also raises the chance of this error occurring.

@bensmrs I've attached the logs of provisioning 5 instances at the same time using the --debug flag. incus-logs.txt

parallel --group 'incus init --debug images:ubuntu/24.04 test{} --storage incus-linstor-pool -d root,size=5GiB' ::: {1..5}

Let me know if there's other logs I could provide aswell.

serturx avatar Jul 28 '25 15:07 serturx

Okay, so the normal image locking logic seems to work, but the problem is when multiple servers all try to import the same image which makes locking a bit trickier in that scenario.

I don't know if there's some kind of Linstor construct we can rely on here to effectively detect that a server has already created the volume and is currently filling it, basically waiting for the volume to be fully ready before the other servers then consume it.

stgraber avatar Jul 28 '25 16:07 stgraber

To be completely honest, I don’t see how it’s the storage backend responsibility to guarantee such a global lock. I can have a look how to solve that on LINSTOR’s side (I have a few ideas), but I feel like the problem will strike us again next time someone implements a distributed storage driver.

bensmrs avatar Jul 28 '25 18:07 bensmrs

Thought of it some more and I see a few ways to handle this:

  1. Use a standard error from storage drivers to indicate that a volume already exists on the underlying storage. If EnsureImage gets that error, it can therefore assume that the volume is being created by another machine. It should then wait for the volume DB record to appear (which is usually the sign of the volume being ready) at which point it can use it. Having a timeout of say 15min would probably make sense just in case we're dealing with a half unpacked volume type situation, in which case we should really return an error to the user.

  2. Replace the current local lock with instead creating an operation in the database. Basically if we don't have a volume, then create a background operation for the creation+unpack of the image, assuming we don't find any such existing operation. If an operation is already ongoing, wait for it to complete. Doing this completely race free may be slightly tricky. It's also somewhat expensive because the DB doesn't store all the operation details, so we need to hit each server running an operation of the expected type to see what image it's about.

  3. We could add a new "locks" table in the database specifically for such global locks. That would basically have an ID, Name and NodeID and work very similarly to our local locking mechanism. We'd come up with a name, say "image_unpack_POOLID_FINGERPRINT", make the table be UNIQUE on the Name so it's not possible to acquire the same lock twice and the NodeID would be there so we can release all locks on Incus restarts. That may be a pretty useful thing to have in general, so long as it's not abuse (could lead to more DB access than we want), but because it's a DB schema change, it wouldn't be backportable to the LTS (not really an issue here as Linstor isn't in the LTS anyway).

I think 1) or 3) are good options. 1) has the advantage of not needing DB changes and being backportable (for example for use by Ceph in the LTS, should it have the same issue as Linstor) whereas 3) has the potential of being a generic construct we can anywhere we want to prevent concurrency in the cluster without having to play games with the operations table.

stgraber avatar Jul 28 '25 21:07 stgraber

Phew I’m torn between the two :) If we end up choosing 1), I’m interested in being assigned. How would the “wait until” be implemented? A stupid spinlock? For 3), global locks are always tricky; which operations do you think really need them? I’m not sure I actually want to commit to this one, but OTOH I’ve never touched database code paths, so it can be useful. I’m having a hard time estimating the actual effort needed, though.

bensmrs avatar Jul 28 '25 21:07 bensmrs

3) wouldn't be particularly hard to introduce, but for now I can't think of another situation where we need this. We'd probably transition the lock-ish thing we have around instance creation over to it to have something faster to query, but we'd still need the background operation in that situation, so basically just adding this on top of the existing logic.

1) Yeah, we don't get notifications of DB record changes sadly, so our best bet is to just re-check for the volume record every 10s or so... Not the cleanest thing but it'd do the job.

stgraber avatar Jul 28 '25 21:07 stgraber

So, shall we go for 1)? Or sleep on it until we find good instances of 3) being particularly useful?

For 1), would only Ceph and LINSTOR be impacted, or also lvmcluster (which I’ve never really touched)?

bensmrs avatar Jul 28 '25 21:07 bensmrs

I think we should go for 1), that's what makes the most sense for this issue. We'll see if we end up with something else that would eventually need 3).

Only Ceph and Linstor should need this. lvmcluster doesn't do optimized images, so instances are just always created from a clean unpack instead (it only supports thick provisioning).

stgraber avatar Jul 28 '25 21:07 stgraber

Alright, I can have a go tomorrow for the LINSTOR side. I’ll also try setting up a Ceph cluster on my testing environment.

bensmrs avatar Jul 28 '25 22:07 bensmrs

Thanks!

FWIW, I've never run into this issue with Ceph despite using Terraform, but it may just have been luck ;)

stgraber avatar Jul 28 '25 23:07 stgraber

The coin being two-sided, I’ve been unlucky enough to see this problem with the LINSTOR driver without using Terraform :) I couldn’t explain it, and am very glad that this issue came up, keeping my strange&unexplained bugs pile to a sane level.

I’m quite intrigued that it never came up with Ceph, but I have no idea how it’s implemented there.

bensmrs avatar Jul 28 '25 23:07 bensmrs

# incus launch images:debian/trixie trixie-1 --target incus-dev & incus launch images:debian/trixie trixie-2 --target incus-dev-2
[1] 916158
Launching trixie-2
Launching trixie-1
Error: Failed instance creation: Failed creating image record: Failed saving main image record: UNIQUE constraint failed: images.project_id, images.fingerprint

This error appears using the dir backend. Edit: same error with linstor, I can’t reproduce the LINSTOR error message.

bensmrs avatar Jul 29 '25 10:07 bensmrs

Ok I managed to reproduce by reducing the place count (I tested on a 2-node cluster, so the resource was actually defined on all the nodes when the place count was 2). So that definitely looks like a storage driver problem (that could arise using other future drivers, I think), so implementing 1) is a good thing. But we’ll need to also implement 3), as there’s definitely a need for global locking.

I’m looking at 1), and if I have the time, I’ll start another PR for 3).

bensmrs avatar Jul 29 '25 11:07 bensmrs

Thinking again about it, it can perfectly be solved with 1) also, by catching the UNIQUE constraint violation. Anyway, I’ll stop making noise :)

bensmrs avatar Jul 29 '25 11:07 bensmrs

So after quite some tinkering, I don’t see a simple way for LINSTOR to return useful status data. And I’m getting more and more confused: I don’t see how volume creation can be triggered twice on LINSTOR side without going through VolumeDBCreate, which should itself fail because of the unicity invariant. There’s something nasty on which I can’t put my finger…

EnsureImage calls VolumeDBCreate, then b.driver.CreateVolume. The volume/resource association in the LINSTOR driver is done in CreateVolume by calling d.setResourceDefinitionProperties. So either something else calls CreateVolume, or VolumeDBCreate runs without any error. I could add safeguards in setResourceDefinitionProperties, but that doesn’t really feel like a good solution.

bensmrs avatar Jul 29 '25 14:07 bensmrs

Btw, @luissimas, if you have any idea about the LINSTOR part of this issue, that could be useful. I’m having trouble both reproducing and explaining it :)

(we should maybe plan a call to discuss what’s not working, now that it has sat a few months in prod)

bensmrs avatar Jul 31 '25 09:07 bensmrs

I don't think there's much going on in the LINSTOR part of things here. One thing that may be causing problems is the fact that we create the resource definition and then set the properties to associate it with an Incus volume as two separate steps, which could open up the possibility for inconsistencies.

I took a quick look at EnsureImage and it really seems like a race condition scenario. Getting to @bensmrs's question:

I don’t see how volume creation can be triggered twice on LINSTOR side without going through VolumeDBCreate, which should itself fail because of the unicity invariant.

After taking a look at the EnsureImage, I ended up reaching the same conclusion as @bensmrs: the VolumeDBCreate call should be enough in this context to at least ensure that we don't create duplicated volumes. Although we'd really have to perform some extra work to handle the error and ensure that all instances are created.

@stgraber do you know if this operation can be performed twice for the same volume image without failing? I'd assume we'd get some sort of unique constraint error from the database (and after looking at the implementation this still seems to be the case).

https://github.com/lxc/incus/blob/8248180f55c7a185cba69ff5a5ad2c09ead8c01a/internal/server/storage/backend.go#L3619-L3623

@bensmrs we could also add some safeguards around CreateVolume in the driver itself, but so far I think this should be handled in the backend layer rather than the driver layer. With that said, the ceph driver seems to have some special logic for image volumes upon creation, maybe to handle these cases.

luissimas avatar Aug 03 '25 17:08 luissimas

Thanks, I preferred to have your opinion on this one, as you wrote most of this logic IIRC, and am glad we’re reaching the same conclusions :) I don’t think it’s the driver’s responsibility to handle it, as it would lead to functional duplication in every remote storage driver. So I’d say we need cluster-wide locks… and investigating why VolumeDBCreate doesn’t fail (or if something else triggers our volume creation logic).

bensmrs avatar Aug 04 '25 08:08 bensmrs

Putting on current milestone so we don't loose track of this one, given it's a bug and we should do something to sort it out :)

I really need to setup a Linstor environment for myself so I can more easily test this stuff.

stgraber avatar Nov 09 '25 06:11 stgraber