Network leak: Persistent accumulation of ESTAB TCP connections on port 8443 after lxc copy --refresh between hosts with ZFS
Please confirm
- [x] I have searched existing issues to check if an issue already exists for the bug I encountered.
Distribution
Ubuntu
Distribution version
24.04
Output of "snap list --all lxd core20 core22 core24 snapd"
vm-host-2:~$ snap list --all lxd core20 core22 core24 snapd
Name Version Rev Tracking Publisher Notes
core22 20250923 2139 latest/stable canonical✓ base,disabled
core22 20251009 2163 latest/stable canonical✓ base
core24 20250829 1196 latest/stable canonical✓ base,disabled
core24 20251001 1225 latest/stable canonical✓ base
lxd 6.5-22da890 35616 latest/stable canonical✓ disabled,in-cohort
lxd 6.5-ccdfb39 36020 latest/stable canonical✓ in-cohort
snapd 2.71 25202 latest/stable canonical✓ snapd,disabled
snapd 2.72 25577 latest/stable canonical✓ snapd
Output of "lxc info" or system info if it fails
vm-host-2:~$ lxc info
config:
cluster.https_address: 192.168.1.250:8443
cluster.images_minimal_replica: "1"
core.https_address: 0.0.0.0:8443
images.auto_update_cached: "false"
images.auto_update_interval: "0"
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- backup_compression
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- clustering_evacuation
- network_ovn_nat_address
- network_bgp
- network_forward
- custom_volume_refresh
- network_counters_errors_dropped
- metrics
- image_source_project
- clustering_config
- network_peer
- linux_sysctl
- network_dns
- ovn_nic_acceleration
- certificate_self_renewal
- instance_project_move
- storage_volume_project_move
- cloud_init
- network_dns_nat
- database_leader
- instance_all_projects
- clustering_groups
- ceph_rbd_du
- instance_get_full
- qemu_metrics
- gpu_mig_uuid
- event_project
- clustering_evacuation_live
- instance_allow_inconsistent_copy
- network_state_ovn
- storage_volume_api_filtering
- image_restrictions
- storage_zfs_export
- network_dns_records
- storage_zfs_reserve_space
- network_acl_log
- storage_zfs_blocksize
- metrics_cpu_seconds
- instance_snapshot_never
- certificate_token
- instance_nic_routed_neighbor_probe
- event_hub
- agent_nic_config
- projects_restricted_intercept
- metrics_authentication
- images_target_project
- cluster_migration_inconsistent_copy
- cluster_ovn_chassis
- container_syscall_intercept_sched_setscheduler
- storage_lvm_thinpool_metadata_size
- storage_volume_state_total
- instance_file_head
- instances_nic_host_name
- image_copy_profile
- container_syscall_intercept_sysinfo
- clustering_evacuation_mode
- resources_pci_vpd
- qemu_raw_conf
- storage_cephfs_fscache
- network_load_balancer
- vsock_api
- instance_ready_state
- network_bgp_holdtime
- storage_volumes_all_projects
- metrics_memory_oom_total
- storage_buckets
- storage_buckets_create_credentials
- metrics_cpu_effective_total
- projects_networks_restricted_access
- storage_buckets_local
- loki
- acme
- internal_metrics
- cluster_join_token_expiry
- remote_token_expiry
- init_preseed
- storage_volumes_created_at
- cpu_hotplug
- projects_networks_zones
- network_txqueuelen
- cluster_member_state
- instances_placement_scriptlet
- storage_pool_source_wipe
- zfs_block_mode
- instance_generation_id
- disk_io_cache
- amd_sev
- storage_pool_loop_resize
- migration_vm_live
- ovn_nic_nesting
- oidc
- network_ovn_l3only
- ovn_nic_acceleration_vdpa
- cluster_healing
- instances_state_total
- auth_user
- security_csm
- instances_rebuild
- numa_cpu_placement
- custom_volume_iso
- network_allocations
- storage_api_remote_volume_snapshot_copy
- zfs_delegate
- operations_get_query_all_projects
- metadata_configuration
- syslog_socket
- event_lifecycle_name_and_project
- instances_nic_limits_priority
- disk_initial_volume_configuration
- operation_wait
- cluster_internal_custom_volume_copy
- disk_io_bus
- storage_cephfs_create_missing
- instance_move_config
- ovn_ssl_config
- init_preseed_storage_volumes
- metrics_instances_count
- server_instance_type_info
- resources_disk_mounted
- server_version_lts
- oidc_groups_claim
- loki_config_instance
- storage_volatile_uuid
- import_instance_devices
- instances_uefi_vars
- instances_migration_stateful
- container_syscall_filtering_allow_deny_syntax
- access_management
- vm_disk_io_limits
- storage_volumes_all
- instances_files_modify_permissions
- image_restriction_nesting
- container_syscall_intercept_finit_module
- device_usb_serial
- network_allocate_external_ips
- explicit_trust_token
- shared_custom_block_volumes
- instance_import_conversion
- instance_create_start
- instance_protection_start
- devlxd_images_vm
- disk_io_bus_virtio_blk
- metrics_api_requests
- projects_limits_disk_pool
- ubuntu_pro_guest_attach
- metadata_configuration_entity_types
- access_management_tls
- network_allocations_ovn_uplink
- network_ovn_uplink_vlan
- state_logical_cpus
- vm_limits_cpu_pin_strategy
- gpu_cdi
- images_all_projects
- metadata_configuration_scope
- unix_device_hotplug_ownership_inherit
- unix_device_hotplug_subsystem_device_option
- storage_ceph_osd_pool_size
- network_get_target
- network_zones_all_projects
- vm_root_volume_attachment
- projects_limits_uplink_ips
- entities_with_entitlements
- profiles_all_projects
- storage_driver_powerflex
- storage_driver_pure
- cloud_init_ssh_keys
- oidc_scopes
- project_default_network_and_storage
- client_cert_presence
- clustering_groups_used_by
- container_bpf_delegation
- override_snapshot_profiles_on_copy
- resources_device_fs_uuid
- backup_metadata_version
- storage_buckets_all_projects
- network_acls_all_projects
- networks_all_projects
- clustering_restore_skip_mode
- disk_io_threads_virtiofsd
- oidc_client_secret
- pci_hotplug
- device_patch_removal
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
client_certificate: false
auth_user_name: <>
auth_user_method: unix
environment:
addresses:
- 192.168.1.250:8443
- 172.16.16.250:8443
architectures:
- x86_64
- i686
backup_metadata_version_range:
- 1
- 2
certificate: |
-----BEGIN CERTIFICATE-----
-----END CERTIFICATE-----
certificate_fingerprint: <>
driver: lxc | qemu
driver_version: 6.0.4 | 8.2.2
instance_types:
- container
- virtual-machine
firewall: nftables
kernel: Linux
kernel_architecture: x86_64
kernel_features:
bpf_token: "false"
idmapped_mounts: "true"
netnsid_getifaddrs: "true"
seccomp_listener: "true"
seccomp_listener_continue: "true"
uevent_injection: "true"
unpriv_binfmt: "true"
unpriv_fscaps: "true"
kernel_version: 6.8.0-87-generic
lxc_features:
cgroup2: "true"
core_scheduling: "true"
devpts_fd: "true"
idmapped_mounts_v2: "true"
mount_injection_file: "true"
network_gateway_device_route: "true"
network_ipvlan: "true"
network_l2proxy: "true"
network_phys_macvlan_mtu: "true"
network_veth_router: "true"
pidfd: "true"
seccomp_allow_deny_syntax: "true"
seccomp_notify: "true"
seccomp_proxy_send_notify_fd: "true"
os_name: Ubuntu
os_version: "24.04"
project: default
server: lxd
server_clustered: true
server_event_mode: full-mesh
server_name: vm-host-2
server_pid: 298456
server_version: "6.5"
server_lts: false
storage: zfs
storage_version: 2.2.2-0ubuntu9.4
storage_supported_drivers:
- name: btrfs
version: 6.6.3
remote: false
- name: ceph
version: 19.2.1
remote: true
- name: powerflex
version: 2.8 (nvme-cli)
remote: true
- name: pure
version: 2.1.9 (iscsiadm) / 2.8 (nvme-cli)
remote: true
- name: zfs
version: 2.2.2-0ubuntu9.4
remote: false
- name: cephfs
version: 19.2.1
remote: true
- name: cephobject
version: 19.2.1
remote: true
- name: dir
version: "1"
remote: false
- name: lvm
version: 2.03.16(2) (2022-05-18) / 1.02.185 (2022-05-18) / 4.48.0
remote: false
Issue description
In a LXD cluster using ZFS storage without shared block, running lxc copy ... --refresh for incremental container replication causes a persistent accumulation of Established (ESTAB) TCP connections on the service port 8443. This resource leak leads to a permanent, cumulative increase in background network traffic between cluster members, growing by approximately 10+ KB/s after each copy operation. With prolonged uptime or frequent copying, this accumulation eventually triggers cluster synchronization warnings.
My cluster interconnect interface bandwidth:
Sockstat:
The drops in the graphs are caused by LXD reloads while troubleshooting. The graphs show low values, but previously with uptime for several months, I've seen tens of megabits.
How it looks:
vm-host-2:~$ ss -tanp | grep 8443 | grep -c ESTAB
75
vm-host-2:~$ lxc copy log-primus log-secundus --verbose --stateless --target vm-host-1 --refresh
vm-host-2:~$ ss -tanp | grep 8443 | grep -c ESTAB
76
vm-host-2:~$ lxc copy log-primus log-secundus --verbose --stateless --target vm-host-1 --refresh
vm-host-2:~$ ss -tanp | grep 8443 | grep -c ESTAB
78
lxc monitor during operations: lxc.monitor.debug.log
And related configs: lxc storage show zpool.txt lxc config show instance.txt
Steps to reproduce
- Create a ZFS storage pool in the cluster, backed by local storage on each cluster host.
- Create a container instance using this ZFS pool on a source node.
- Create a snapshot of the instance (manually or scheduled).
- Check the current count of established TCP connections on the source host:
ss -tanp | grep 8443 | grep -c ESTAB - Run the incremental copy command to a target node:
lxc copy <source-container> <target-container> --verbose --stateless --target <target-node> --refresh - Check the established connections count again to confirm the increase:
ss -tanp | grep 8443 | grep -c ESTAB
Information to attach
- [ ] Any relevant kernel output (
dmesg) - [ ] Instance log (
lxc info NAME --show-log) - [x] Instance configuration (
lxc config show NAME --expanded) - [ ] Main daemon log (at
/var/log/lxd/lxd.logor/var/snap/lxd/common/lxd/logs/lxd.log) - [ ] Output of the client with
--debug - [x] Output of the daemon with
--debug(or uselxc monitorwhile reproducing the issue)
Would it be possible for you to test (on a non-production system) where you're seeing this on the latest/edge channel as that will become LXD 6.6 soon. Thanks!
The issue is reproducible on a fresh installation of both latest/stable and latest/edge.
vm-test-1:~# snap list --all lxd
Name Version Rev Tracking Publisher Notes
lxd 6.5-ccdfb39 36020 latest/edge canonical✓ disabled
lxd git-7c2b109 36693 latest/edge canonical✓ -
vm-test-1:~# ss -tanp | grep 8443 | grep -c ESTAB
11
vm-test-1:~# lxc copy test-primus test-secundus --verbose --stateless --target vm-test-2 --refresh
vm-test-1:~# ss -tanp | grep 8443 | grep -c ESTAB
13
vm-test-1:~# lxc copy test-primus test-secundus --verbose --stateless --target vm-test-2 --refresh
vm-test-1:~# ss -tanp | grep 8443 | grep -c ESTAB
15
# after 10 mins
vm-test-1:~# ss -tanp | grep 8443 | grep -c ESTAB
14