In the case of using a zfs encrypted pool as the linstor storage backend and linstor as the incus storage backend, incus launch and incus init do not work properly.
Is there an existing issue for this?
- [x] There is no existing issue for this bug
Is this happening on an up to date version of Incus?
- [x] This is happening on a supported version of Incus
Incus system details
config:
storage.linstor.controller_connection: http://127.0.0.1:3370
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- network_sriov
- console
- restrict_dev_incus
- migration_pre_copy
- infiniband
- dev_incus_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- dev_incus_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- backup_compression
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- clustering_evacuation
- network_ovn_nat_address
- network_bgp
- network_forward
- custom_volume_refresh
- network_counters_errors_dropped
- metrics
- image_source_project
- clustering_config
- network_peer
- linux_sysctl
- network_dns
- ovn_nic_acceleration
- certificate_self_renewal
- instance_project_move
- storage_volume_project_move
- cloud_init
- network_dns_nat
- database_leader
- instance_all_projects
- clustering_groups
- ceph_rbd_du
- instance_get_full
- qemu_metrics
- gpu_mig_uuid
- event_project
- clustering_evacuation_live
- instance_allow_inconsistent_copy
- network_state_ovn
- storage_volume_api_filtering
- image_restrictions
- storage_zfs_export
- network_dns_records
- storage_zfs_reserve_space
- network_acl_log
- storage_zfs_blocksize
- metrics_cpu_seconds
- instance_snapshot_never
- certificate_token
- instance_nic_routed_neighbor_probe
- event_hub
- agent_nic_config
- projects_restricted_intercept
- metrics_authentication
- images_target_project
- images_all_projects
- cluster_migration_inconsistent_copy
- cluster_ovn_chassis
- container_syscall_intercept_sched_setscheduler
- storage_lvm_thinpool_metadata_size
- storage_volume_state_total
- instance_file_head
- instances_nic_host_name
- image_copy_profile
- container_syscall_intercept_sysinfo
- clustering_evacuation_mode
- resources_pci_vpd
- qemu_raw_conf
- storage_cephfs_fscache
- network_load_balancer
- vsock_api
- instance_ready_state
- network_bgp_holdtime
- storage_volumes_all_projects
- metrics_memory_oom_total
- storage_buckets
- storage_buckets_create_credentials
- metrics_cpu_effective_total
- projects_networks_restricted_access
- storage_buckets_local
- loki
- acme
- internal_metrics
- cluster_join_token_expiry
- remote_token_expiry
- init_preseed
- storage_volumes_created_at
- cpu_hotplug
- projects_networks_zones
- network_txqueuelen
- cluster_member_state
- instances_placement_scriptlet
- storage_pool_source_wipe
- zfs_block_mode
- instance_generation_id
- disk_io_cache
- amd_sev
- storage_pool_loop_resize
- migration_vm_live
- ovn_nic_nesting
- oidc
- network_ovn_l3only
- ovn_nic_acceleration_vdpa
- cluster_healing
- instances_state_total
- auth_user
- security_csm
- instances_rebuild
- numa_cpu_placement
- custom_volume_iso
- network_allocations
- zfs_delegate
- storage_api_remote_volume_snapshot_copy
- operations_get_query_all_projects
- metadata_configuration
- syslog_socket
- event_lifecycle_name_and_project
- instances_nic_limits_priority
- disk_initial_volume_configuration
- operation_wait
- image_restriction_privileged
- cluster_internal_custom_volume_copy
- disk_io_bus
- storage_cephfs_create_missing
- instance_move_config
- ovn_ssl_config
- certificate_description
- disk_io_bus_virtio_blk
- loki_config_instance
- instance_create_start
- clustering_evacuation_stop_options
- boot_host_shutdown_action
- agent_config_drive
- network_state_ovn_lr
- image_template_permissions
- storage_bucket_backup
- storage_lvm_cluster
- shared_custom_block_volumes
- auth_tls_jwt
- oidc_claim
- device_usb_serial
- numa_cpu_balanced
- image_restriction_nesting
- network_integrations
- instance_memory_swap_bytes
- network_bridge_external_create
- network_zones_all_projects
- storage_zfs_vdev
- container_migration_stateful
- profiles_all_projects
- instances_scriptlet_get_instances
- instances_scriptlet_get_cluster_members
- instances_scriptlet_get_project
- network_acl_stateless
- instance_state_started_at
- networks_all_projects
- network_acls_all_projects
- storage_buckets_all_projects
- resources_load
- instance_access
- project_access
- projects_force_delete
- resources_cpu_flags
- disk_io_bus_cache_filesystem
- instance_oci
- clustering_groups_config
- instances_lxcfs_per_instance
- clustering_groups_vm_cpu_definition
- disk_volume_subpath
- projects_limits_disk_pool
- network_ovn_isolated
- qemu_raw_qmp
- network_load_balancer_health_check
- oidc_scopes
- network_integrations_peer_name
- qemu_scriptlet
- instance_auto_restart
- storage_lvm_metadatasize
- ovn_nic_promiscuous
- ovn_nic_ip_address_none
- instances_state_os_info
- network_load_balancer_state
- instance_nic_macvlan_mode
- storage_lvm_cluster_create
- network_ovn_external_interfaces
- instances_scriptlet_get_instances_count
- cluster_rebalance
- custom_volume_refresh_exclude_older_snapshots
- storage_initial_owner
- storage_live_migration
- instance_console_screenshot
- image_import_alias
- authorization_scriptlet
- console_force
- network_ovn_state_addresses
- network_bridge_acl_devices
- instance_debug_memory
- init_preseed_storage_volumes
- init_preseed_profile_project
- instance_nic_routed_host_address
- instance_smbios11
- api_filtering_extended
- acme_dns01
- security_iommu
- network_ipv4_dhcp_routes
- network_state_ovn_ls
- network_dns_nameservers
- acme_http01_port
- network_ovn_ipv4_dhcp_expiry
- instance_state_cpu_time
- network_io_bus
- disk_io_bus_usb
- storage_driver_linstor
- instance_oci_entrypoint
- network_address_set
- server_logging
- network_forward_snat
- memory_hotplug
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
auth_user_name: root
auth_user_method: unix
environment:
addresses: []
architectures:
- x86_64
- i686
certificate: |
-----BEGIN CERTIFICATE-----
MIIB8zCCAXqgAwIBAgIQZZvd9SUCFQsqn089S93cMDAKBggqhkjOPQQDAzAuMRkw
FwYDVQQKExBMaW51eCBDb250YWluZXJzMREwDwYDVQQDDAhyb290QGFhYTAeFw0y
NTA1MDQxMzI5MTVaFw0zNTA1MDIxMzI5MTVaMC4xGTAXBgNVBAoTEExpbnV4IENv
bnRhaW5lcnMxETAPBgNVBAMMCHJvb3RAYWFhMHYwEAYHKoZIzj0CAQYFK4EEACID
YgAEVlegTzljb86aZspbZJrLISWTudYLBG/ALF081acwJ9AxGhCZHxiGj3rTIeZW
GjLtJXAswMtnCY2LSzCrES/1Fezftpiqm/BuXzaTljII6XnaFV6KPsx1+0/YR2s1
+BaDo10wWzAOBgNVHQ8BAf8EBAMCBaAwEwYDVR0lBAwwCgYIKwYBBQUHAwEwDAYD
VR0TAQH/BAIwADAmBgNVHREEHzAdggNhYWGHBH8AAAGHEAAAAAAAAAAAAAAAAAAA
AAEwCgYIKoZIzj0EAwMDZwAwZAIwAW5HkgiVs5n/wCvOzTT/UghxV2nXjCbgE/o2
utg/jctdZFXMqYpMTIFAgIlNCJuFAjBkICaGGrxbC+BzTFAWHAqsAnrnB9bKBIRC
GTm+zjKfkAf/sB6PraFWBbbuHmEi2qo=
-----END CERTIFICATE-----
certificate_fingerprint: 011144e4185bc826a2ac7f03587834b62088092a58d23e0cabd17d9de25987c0
driver: lxc
driver_version: 6.0.4
firewall: nftables
kernel: Linux
kernel_architecture: x86_64
kernel_features:
idmapped_mounts: "true"
netnsid_getifaddrs: "true"
seccomp_listener: "true"
seccomp_listener_continue: "true"
uevent_injection: "true"
unpriv_binfmt: "true"
unpriv_fscaps: "true"
kernel_version: 6.8.0-59-generic
lxc_features:
cgroup2: "true"
core_scheduling: "true"
devpts_fd: "true"
idmapped_mounts_v2: "true"
mount_injection_file: "true"
network_gateway_device_route: "true"
network_ipvlan: "true"
network_l2proxy: "true"
network_phys_macvlan_mtu: "true"
network_veth_router: "true"
pidfd: "true"
seccomp_allow_deny_syntax: "true"
seccomp_notify: "true"
seccomp_proxy_send_notify_fd: "true"
os_name: Ubuntu
os_version: "24.04"
project: default
server: incus
server_clustered: false
server_event_mode: full-mesh
server_name: aaa
server_pid: 2308
server_version: "6.12"
storage: linstor
storage_version: 1.31.0 / 9.2.13
storage_supported_drivers:
- name: lvm
version: 2.03.16(2) (2022-05-18) / 1.02.185 (2022-05-18) / 4.48.0
remote: false
- name: lvmcluster
version: 2.03.16(2) (2022-05-18) / 1.02.185 (2022-05-18) / 4.48.0
remote: true
- name: zfs
version: 2.3.1-1
remote: false
- name: linstor
version: 1.31.0 / 9.2.13
remote: true
- name: btrfs
version: 6.6.3
remote: false
- name: dir
version: "1"
remote: false
Instance details
No response
Instance log
No response
Current behavior
No response
Expected behavior
The container is started correctly
Steps to reproduce
To reproduce the error, we need an encrypted ZFS pool configured as Linstor's backend, with Linstor configured as the storage backend of Incus. Steps 1-7 created an encrypted ZFS pool and configured Incus and LINSTOR. Steps 8-9 created a container but encountered errors.
The hostname of ubuntu is aaa.
The node name of LINSTOR is also aaa.
The ZFS pool name is tmp-pool
The LINSTOR storage pool name is incus, and it is located on the ZFS dataset tmp-pool/incus.
The Incus storage pool name is default.
And finally the container name is c1.
-
Install ubuntu-server with hostname
aaa. -
Install zfs, incus, linstor. The zfs and incus are installed from
pkgs.zabbly.com, linstor is installed fromppa:linbit/linbit-drbd9-stack. There is some additional configuration involved, but it basically just installs the zfs, incus and linstor.
cat > /etc/apt/sources.list.d/ubuntu.sources << 'EOF'; $(echo)
Types: deb
URIs: https://mirrors.ustc.edu.cn/ubuntu
Suites: noble noble-updates noble-backports
Components: main restricted universe multiverse
Signed-By: /usr/share/keyrings/ubuntu-archive-keyring.gpg
Types: deb
URIs: https://mirrors.ustc.edu.cn/ubuntu
Suites: noble-security
Components: main restricted universe multiverse
Signed-By: /usr/share/keyrings/ubuntu-archive-keyring.gpg
EOF
curl -fsSL https://pkgs.zabbly.com/key.asc -o /etc/apt/keyrings/zabbly.asc
sh -c 'cat <<EOF > /etc/apt/sources.list.d/zabbly-incus-stable.sources
Enabled: yes
Types: deb
URIs: https://mirrors.ustc.edu.cn/incus/stable
Suites: noble
Components: main
Architectures: $(dpkg --print-architecture)
Signed-By: /etc/apt/keyrings/zabbly.asc
EOF'
sh -c 'cat <<EOF > /etc/apt/sources.list.d/zabbly-kernel-stable.sources
Enabled: yes
Types: deb
URIs: https://pkgs.zabbly.com/kernel/stable
Suites: noble
Components: zfs
Architectures: $(dpkg --print-architecture)
Signed-By: /etc/apt/keyrings/zabbly.asc
EOF'
add-apt-repository ppa:linbit/linbit-drbd9-stack
timedatectl set-timezone Asia/Shanghai
echo 'blacklist ntfs3' | tee /etc/modprobe.d/disable-ntfs3.conf
apt-get update
apt-get dist-upgrade -y
apt-get install net-tools tmux beep restic nut unzip hdparm nfs-kernel-server samba nfs-common rsync smartmontools mtr python-is-python3 progress openssl htop cron fio stress openzfs-zfsutils openzfs-zfs-dkms openzfs-zfs-initramfs bridge-utils git gcc g++ make cmake build-essential curl nano iperf3 ntfs-3g iputils-ping python3-pip dosfstools systemd-boot-efi libsort-versions-perl libboolean-perl libyaml-pp-perl fzf mbuffer kexec-tools dracut-core efibootmgr bsdextrautils qemu-system netcat-openbsd lsof incus qemu-system lvm2 drbd-dkms drbd-utils linstor-satellite linstor-controller linstor-client -y
apt-get purge lxd-installer -y
apt autoremove -y
apt-mark hold linux-image-generic linux-headers-generic nvidia-driver-550 incus
cat > /usr/lib/systemd/system/zfs-myloadkey.service << 'EOF'; $(echo)
[Unit]
Description=Load ZFS encryption keys
DefaultDependencies=no
After=zfs-import.target
Before=zfs-mount.service
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/sbin/zfs load-key -a
StandardInput=tty-force
[Install]
WantedBy=zfs-mount.service
EOF
systemctl daemon-reload
systemctl enable zfs-myloadkey.service
systemctl enable --now linstor-satellite
systemctl enable --now linstor-controller
- Create an encrypted ZFS pool
zpool create \
-o autotrim=on \
-O checksum=sha512 \
-O compression=lz4 \
-O atime=off \
-O aclinherit=discard \
-O casesensitivity=sensitive \
-O dedup=off \
-O acltype=posix \
-O relatime=on \
-O encryption=aes-256-gcm \
-O keyformat=hex \
-O keylocation=file:///etc/zfs/tmp-pool.key \
tmp-pool sdb
zpool status
- Create a LINSTOR node
linstor node create aaa 127.0.0.1 --node-type combined
linstor node list
linstor node info
- Create zfs dataset and linstor storage-pool
zfs create tmp-pool/incus
linstor storage-pool create zfsthin aaa incus tmp-pool/incus
linstor storage-pool list
- incus init
incus admin init
Would you like to use clustering? (yes/no) [default=no]: no
Do you want to configure a new storage pool? (yes/no) [default=yes]: no
Would you like to create a new local network bridge? (yes/no) [default=yes]: no
Would you like to use an existing bridge or host interface? (yes/no) [default=no]: no
Would you like the server to be available over the network? (yes/no) [default=no]: no
Would you like stale cached images to be updated automatically? (yes/no) [default=yes]: yes
Would you like a YAML "init" preseed to be printed? (yes/no) [default=no]: yes
config: {}
networks: []
storage_pools: []
storage_volumes: []
profiles:
- config: {}
description: ""
devices: {}
name: default
project: default
projects: []
cluster: null
- incus storage config
incus config set storage.linstor.controller_connection=http://127.0.0.1:3370
incus storage create default linstor
incus storage set default linstor.resource_group.storage_pool=incus
incus storage set default linstor.resource_group.place_count=1
linstor resource-group list
- Launch a container
incus launch images:alpine/3.21 c1 --storage default
Got the error:
Error: Failed instance creation: Failed creating instance from image: Clone operation failed
- Use
linstor err listandlinstor err show
root@aaa:~# linstor err list
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Id ┊ Datetime ┊ Node ┊ Exception ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ 68176B7C-B37A4-000000 ┊ 2025-05-04 21:29:37 ┊ S|aaa ┊ LinStorException: None 0 exit from: [timeout, 0, bash, -c, set -o pipefail;... ┊
┊ 68176B7C-B37A4-000001 ┊ 2025-05-04 21:29:37 ┊ S|aaa ┊ StorageException: drbdmeta check-resize failed ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
root@aaa:~# linstor err show 68176B7C-B37A4-000000
ERROR REPORT 68176B7C-B37A4-000000
============================================================
Application: LINBIT® LINSTOR
Module: Satellite
Version: 1.31.0
Build ID: a187af5c85a96bb27df87a5eab0bcf9dd6de6a34
Build time: 2025-04-08T09:36:27+00:00
Error time: 2025-05-04 13:29:37
Node: aaa
Thread: clone_ZFS_COPY(incus-volume-113d198b22bb4085bb18ba29ee3a0d36/0->incus-volume-b838c402629d4b55968172d6d262453d/0)
============================================================
Reported error:
===============
Category: LinStorException
Class name: LinStorException
Class canonical name: com.linbit.linstor.LinStorException
Generated at: Method 'run', Source file 'CloneDaemon.java', Line #137
Error message: None 0 exit from: [timeout, 0, bash, -c, set -o pipefail; zfs send --embed --large-block tmp-pool/incus/incus-volume-113d198b22bb4085bb18ba29ee3a0d36_00000@CF_ad1d8095 | zfs receive -F tmp-pool/incus/incus-volume-b838c402629d4b55968172d6d262453d_00000 && zfs destroy -r tmp-pool/incus/incus-volume-b838c402629d4b55968172d6d262453d_00000@%]
ErrorContext:
Description: Clone command failed
Cause: cannot receive new filesystem stream: zfs receive -F cannot be used to destroy an encrypted filesystem or overwrite an unencrypted one with an encrypted one
Call backtrace:
Method Native Class:Line number
run N com.linbit.linstor.clone.CloneDaemon:137
run N java.lang.Thread:1583
END OF ERROR REPORT.
root@aaa:~# linstor err show 68176B7C-B37A4-000001
ERROR REPORT 68176B7C-B37A4-000001
============================================================
Application: LINBIT® LINSTOR
Module: Satellite
Version: 1.31.0
Build ID: a187af5c85a96bb27df87a5eab0bcf9dd6de6a34
Build time: 2025-04-08T09:36:27+00:00
Error time: 2025-05-04 13:29:37
Node: aaa
Thread: clone_ZFS_COPY(incus-volume-113d198b22bb4085bb18ba29ee3a0d36/0->incus-volume-b838c402629d4b55968172d6d262453d/0)
============================================================
Reported error:
===============
Category: LinStorException
Class name: StorageException
Class canonical name: com.linbit.linstor.storage.StorageException
Generated at: Method 'processAfterClone', Source file 'DrbdLayer.java', Line #2041
Error message: drbdmeta check-resize failed
ErrorContext:
Call backtrace:
Method Native Class:Line number
processAfterClone N com.linbit.linstor.layer.drbd.DrbdLayer:2041
processAfterClone N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:1770
cleanupDevices N com.linbit.linstor.clone.CloneService:204
postClone N com.linbit.linstor.clone.CloneService:544
lambda$startClone$2 N com.linbit.linstor.clone.CloneService:395
run N com.linbit.linstor.clone.CloneDaemon:171
run N java.lang.Thread:1583
Caused by:
==========
Category: LinStorException
Class name: ExtCmdFailedException
Class canonical name: com.linbit.extproc.ExtCmdFailedException
Generated at: Method 'drbdMetaCheckResize', Source file 'DrbdAdm.java', Line #410
Error message: The external command 'drbdmeta' exited with error code 1
ErrorContext:
Description: Execution of the external command 'drbdmeta' failed.
Cause: The external command exited with error code 1.
Correction: - Check whether the external program is operating properly.
- Check whether the command line is correct.
Contact a system administrator or a developer if the command line is no longer valid
for the installed version of the external program.
Details: The full command line executed was:
drbdmeta 1001 v09 /dev/zvol/tmp-pool/incus/incus-volume-b838c402629d4b55968172d6d262453d_00000 internal check-resize
The external command sent the following output data:
The external command sent the following error information:
no suitable meta data found :(
Call backtrace:
Method Native Class:Line number
drbdMetaCheckResize N com.linbit.linstor.layer.drbd.utils.DrbdAdm:410
processAfterClone N com.linbit.linstor.layer.drbd.DrbdLayer:2026
processAfterClone N com.linbit.linstor.core.devmgr.DeviceHandlerImpl:1770
cleanupDevices N com.linbit.linstor.clone.CloneService:204
postClone N com.linbit.linstor.clone.CloneService:544
lambda$startClone$2 N com.linbit.linstor.clone.CloneService:395
run N com.linbit.linstor.clone.CloneDaemon:171
run N java.lang.Thread:1583
END OF ERROR REPORT.
root@aaa:~#
The error may be caused by the copy of zfs encrypted datasets
Error message: None 0 exit from: [timeout, 0, bash, -c, set -o pipefail; zfs send --embed --large-block tmp-pool/incus/incus-volume-113d198b22bb4085bb18ba29ee3a0d36_00000@CF_ad1d8095 | zfs receive -F tmp-pool/incus/incus-volume-b838c402629d4b55968172d6d262453d_00000 && zfs destroy -r tmp-pool/incus/incus-volume-b838c402629d4b55968172d6d262453d_00000@%]
ErrorContext:
Description: Clone command failed
Cause: cannot receive new filesystem stream: zfs receive -F cannot be used to destroy an encrypted filesystem or overwrite an unencrypted one with an encrypted one
And if you ignored this error, something even more strange happens as shown in Cannot create containers in a pool with the linstor storage backend, but they are not related to the error above.
Besides, I have several containers and vms which are migrated from other backends to linstor with a encryted zfs dataset by incus move and they are working fine now.
@bensmrs any thoughts on this one?
I'm failing to see how the satellite configuration within Linstor would turn into an Incus bug ;)
It’s been discussed on the forum; I’ll try to find some time to have a look this week
Just out of curiosity, have you tested LINSTOR with encrypted ZFS pools, @luissimas? I’ll setup a lab to see if it’s really driver- and not LINSTOR-related.
LINSTOR doesn’t seem to support encryption at the storage driver level, only through LUKS. Your setup looks unsupported FWICT. https://github.com/LINBIT/linstor-server/issues/177
@stgraber I think you can mark this one as blocked.
I'd actually mark it as closed given that we don't have any mention or documentation on our side telling folks to run Linstor on encrypted ZFS. If they do and it's unsupported on the Linstor side, it's completely out of our control and just a Linstor bug or limitation.
Just out of curiosity, have you tested LINSTOR with encrypted ZFS pools, @luissimas? I’ll setup a lab to see if it’s really driver- and not LINSTOR-related.
Hmmm, I haven't. We've been using mostly LVM-Thin in our deployments. AFAIK, the only encryption feature that LINSTOR supports is the one you mentioned, using LUKS to encrypt the volumes, not the storage pool.
Firstly, I would like to express my gratitude to the staff for their dedicated efforts.
I do agree that this is a linstor bug and it is not appropriate to discuss it here, so I turned to the linstor community for some help.
I also saw that linstor does not fully support encrypted zfs volumes, which might be a cause for concern.
I'd actually mark it as closed given that we don't have any mention or documentation on our side telling folks to run Linstor on encrypted ZFS. If they do and it's unsupported on the Linstor side, it's completely out of our control and just a Linstor bug or limitation.
But I don't agree here. The documentation of incus does not recommend using a zfs encrypted pool but it does not provide a clear warning to users that encrypted ZFS pools cannot be utilized at the sametime. I believe that such important information should be highlighted in the documentation to prevent potential issues like this one.
Alright, I’ll add a quick sentence in the documentation to reflect this. But because distributed storage pools are not entirely managed by Incus the way, say, Proxmox does it, it is presumed that you already know how to use them. Incus only acts as a client and does not configure the distributed storage backend per se. The instructions we provide are a good starter, but you should already be familiar with the technology (although to be perfectly honest, I was not when we started developing the Incus driver).
Some weird solutions:
Change the source code of linstor at here from
case ZFS:
case ZFS_THIN:
result.add(DeviceHandler.CloneStrategy.ZFS_COPY);
break;
to
case ZFS:
case ZFS_THIN:
//result.add(DeviceHandler.CloneStrategy.ZFS_COPY);
break;
And then install the linstor from the modified source code.
The change will force the linstor to fall back to a dd clone mode with out using zfs send and zfs recv.
Not sure what the other impact is, but I can now successfully create containers on linstor using an encrypted zfs pool as a backend.
Besides, I saw an argument named use_zfs_clone in ResourceDefinitionCloneRequest at here.
This parameter seems to have the effect of disabling zfs send and zfs recv too, as shown in here. I am not sure whether this parameter needs to be sent by incus to linstor. If so, is it necessary to provide an option in the storage pool properties of incus to set this parameter? @bensmrs
We’ll definitely not recommend people to patch LINSTOR, nor disable features that actually make ZFS pools efficient. The Incus driver doesn’t know what storage driver you use below LINSTOR, and it better not! LINSTOR abstracts the underlying storage, so I suggest you to talk with the LINBIT folk, as we will not deal with it ourselves. This kind of highly-specific option is not something I’ve seen @stgraber really happy with over the (admittedly short) time I’ve contributed to Incus.
And maintaining the relevant code paths is not something I’d be happy with, as you can definitely mix storage pool providers with LINSTOR. I’m not saying it’s something most sane persons would do, but you sure can have encrypted and non-encrypted LINSTOR storage pools serving the same resource group; putting an option disabling ZFS optimizations would reduce ZFS performance on non-encrypted LINSTOR storage pools.
I think it’s because of this mix-and-match features that LINBIT recommends using encryption at the volume level, although I’ll not speak for them.