arm64: /proc/cpuinfo doesn't honour personality inside LXD container
Required information
- Distribution: Ubuntu
- Distribution version: 22.04
- The output of "lxc info":
config: {}
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- rbac
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- clustering_evacuation
- network_ovn_nat_address
- network_bgp
- network_forward
- custom_volume_refresh
- network_counters_errors_dropped
- metrics
- image_source_project
- clustering_config
- network_peer
- linux_sysctl
- network_dns
- ovn_nic_acceleration
- certificate_self_renewal
- instance_project_move
- storage_volume_project_move
- cloud_init
- network_dns_nat
- database_leader
- instance_all_projects
- clustering_groups
- ceph_rbd_du
- instance_get_full
- qemu_metrics
- gpu_mig_uuid
- event_project
- clustering_evacuation_live
- instance_allow_inconsistent_copy
- network_state_ovn
- storage_volume_api_filtering
- image_restrictions
- storage_zfs_export
- network_dns_records
- storage_zfs_reserve_space
- network_acl_log
- storage_zfs_blocksize
- metrics_cpu_seconds
- instance_snapshot_never
- certificate_token
- instance_nic_routed_neighbor_probe
- event_hub
- agent_nic_config
- projects_restricted_intercept
- metrics_authentication
- images_target_project
- cluster_migration_inconsistent_copy
- cluster_ovn_chassis
- container_syscall_intercept_sched_setscheduler
- storage_lvm_thinpool_metadata_size
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
addresses: []
architectures:
- aarch64
- armv7l
certificate: |
-----BEGIN CERTIFICATE-----
MIICHjCCAaSgAwIBAgIQP1pkHkT3fQQjhBX9XIa6rzAKBggqhkjOPQQDAzA9MRww
GgYDVQQKExNsaW51eGNvbnRhaW5lcnMub3JnMR0wGwYDVQQDDBRyb290QGFybTY0
LWFybWhmLWx4YzAeFw0yMjA4MTExMDMyMDRaFw0zMjA4MDgxMDMyMDRaMD0xHDAa
BgNVBAoTE2xpbnV4Y29udGFpbmVycy5vcmcxHTAbBgNVBAMMFHJvb3RAYXJtNjQt
YXJtaGYtbHhjMHYwEAYHKoZIzj0CAQYFK4EEACIDYgAESG9PtAYjDY/wMPc6bOdv
9ZEMkiJLwPqmm7kmhDnXYzYChK5BIX98HjQgVc70NCxlcg6HkNK86naWuPAW4WTq
NuZJOu4XEmt1+OF53GfeUVw61K5KWwjG/m2EWq5zXTIMo2kwZzAOBgNVHQ8BAf8E
BAMCBaAwEwYDVR0lBAwwCgYIKwYBBQUHAwEwDAYDVR0TAQH/BAIwADAyBgNVHREE
KzApgg9hcm02NC1hcm1oZi1seGOHBH8AAAGHEAAAAAAAAAAAAAAAAAAAAAEwCgYI
KoZIzj0EAwMDaAAwZQIxAIh5o3xZ+OO/uNfAuhQZQSsd40PWrLmr33XGo1q0l/1q
Y3LvlqbCBWm0+dwevhQc6AIwZ/BpvLKHGKEAL3Wr0DwljDbt+DrP9xtS/HjI2fhv
iqW/P9/C2w374/Y60VkFJAWE
-----END CERTIFICATE-----
certificate_fingerprint: 3b96071484ef9ffabacee84629347107fe4aec5753d1f1e0ebf31d02343a55b6
driver: lxc
driver_version: 4.0.12
firewall: nftables
kernel: Linux
kernel_architecture: aarch64
kernel_features:
idmapped_mounts: "true"
netnsid_getifaddrs: "true"
seccomp_listener: "true"
seccomp_listener_continue: "true"
shiftfs: "false"
uevent_injection: "true"
unpriv_fscaps: "true"
kernel_version: 5.15.0-46-generic
lxc_features:
cgroup2: "true"
core_scheduling: "true"
devpts_fd: "true"
idmapped_mounts_v2: "true"
mount_injection_file: "true"
network_gateway_device_route: "true"
network_ipvlan: "true"
network_l2proxy: "true"
network_phys_macvlan_mtu: "true"
network_veth_router: "true"
pidfd: "true"
seccomp_allow_deny_syntax: "true"
seccomp_notify: "true"
seccomp_proxy_send_notify_fd: "true"
os_name: Ubuntu
os_version: "22.04"
project: default
server: lxd
server_clustered: false
server_event_mode: full-mesh
server_name: arm64-armhf-lxc
server_pid: 1487
server_version: 5.0.0
storage: dir
storage_version: "1"
storage_supported_drivers:
- name: ceph
version: 15.2.14
remote: true
- name: btrfs
version: 5.4.1
remote: false
- name: cephfs
version: 15.2.14
remote: true
- name: dir
version: "1"
remote: false
- name: lvm
version: 2.03.07(2) (2019-11-30) / 1.02.167 (2019-11-30) / 4.45.0
remote: false
- name: zfs
version: 2.1.4-0ubuntu0.1
remote: false
Issue description
We've been trying to work out why Rust-based snap builds for armhf hang on Launchpad's build farm, where they're executed in armhf containers via LXD on arm64 machines, also using linux32 to set the personality (although this may not be necessary when running in a 32-bit LXD container - I think LXD already handles that?).
We seem to be running into something like https://github.com/rust-lang/rust/issues/60605, but it's a little weirder than that. rustup is only picking arm (i.e. ARMv6) because it gets confused about the processor's capabilities. rustup-init.sh has this code:
# Detect armv7 but without the CPU features Rust needs in that build,
# and fall back to arm.
# See https://github.com/rust-lang/rustup.rs/issues/587.
if [ "$_ostype" = "unknown-linux-gnueabihf" ] && [ "$_cputype" = armv7 ]; then
if ensure grep '^Features' /proc/cpuinfo | grep -q -v neon; then
# At least one processor does not have NEON.
_cputype=arm
fi
fi
And we're seeing:
+ [ unknown-linux-gnueabihf = unknown-linux-gnueabihf ]
+ [ armv7 = armv7 ]
+ ensure grep ^Features /proc/cpuinfo
+ grep ^Features /proc/cpuinfo
+ grep -q -v neon
+ _cputype=arm
I tried to track this down in a less weird environment than a builder, launching an Ubuntu 22.04 arm64 machine as described in the lxc info output above. I got as far as this:
$ grep -m1 ^Features /proc/cpuinfo
Features : fp asimd evtstrm cpuid
$ linux32 grep -m1 ^Features /proc/cpuinfo
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt lpae evtstrm
$ lxc launch ubuntu:bionic/armhf
Creating the instance
Instance name is: positive-hyena
Starting positive-hyena
$ lxc exec positive-hyena -- linux32 grep -m1 ^Features /proc/cpuinfo
Features : fp asimd evtstrm cpuid
This seems pretty odd, but at this point I don't know where to look next. Is this a LXD bug for somehow failing to set up the environment correctly, or is it a kernel bug for getting confused by containerization and somehow not noticing the personality change?
Steps to reproduce
lxc launch an armhf container on arm64, and run linux32 grep -m1 ^Features /proc/cpuinfo inside it.
Does your host have compat_uts_machine=armv7l set in the kernel command-line? We do this in our lxd armhf instances on arm64 hosts on focal, because otherwise containers end up declaring that they are armv8-32 machine type which nobody uses.
It would be interesting to see the output of uname -a from inside your container.
@cjwatson can you try umount /proc/cpuinfo in the container?
My current guess is that the kernel has made cpuinfo to be affinity aware (urgh) but in our case, lxcfs provides /proc/cpuinfo as a FUSE overlay inside of the container (to filter the CPUs based on cgroups). LXCFS itself is an arm64 piece of code running on the host, so regardless of the personality of the caller, /proc/cpuinfo from the kernel will be accessed by an arm64 binary.
If that's indeed the issue, we can move the bug over to lxcfs and see if there's some kind of way to:
- Determine the personality of the caller process (whatever opens /proc/cpuinfo in the container)
- Somehow trick the kernel into providing us the
cpuinfocontent for that personality rather than our own
I suspect that 1) should be easy enough to figure out through some proc file, 2) may be a bit more challenging though.
@xnox My Canonistack test didn't have compat_uts_machine=armv7l on the command line, but Launchpad's arm64 builder VMs do. In a container on a builder, uname -a prints Linux flexible-bluejay 5.4.0-124-generic lxc/lxd#140-Ubuntu SMP Thu Aug 4 02:27:01 UTC 2022 armv7l armv7l armv7l GNU/Linux.
@stgraber You're quite right: /proc/cpuinfo is mounted, and if I unmount it then I see the correct features.
- Determine the personality of the caller process (whatever opens /proc/cpuinfo in the container)
cat /proc/$PID/personality should give the value of the calling process.
- Somehow trick the kernel into providing us the
cpuinfocontent for that personality rather than our ownI suspect that 1) should be easy enough to figure out through some proc file, 2) may be a bit more challenging though.
I think one can use syscall previous = personality(PER_LINUX32); to switch to 32bit, or like use whatever value one got from procfs personality file.
check return is not negative, and restore personality after one is done.
Moving over to LXCFS. It may take us a little while before we have manpower to put on this (we'll have a new hire on it, just not sure about start date yet).
Until then, I'd recommend unmounting /proc/cpuinfo in such environments. It will have the downside of possibly over-reporting the number of CPU cores available to some tools, but that's likely less problematic than the incorrect CPU flags.
@stgraber Thanks for the suggestion. I've proposed https://code.launchpad.net/~cjwatson/launchpad-buildd/+git/launchpad-buildd/+merge/428923 for that.
This is worked around on Launchpad production now.
We can try to detect called process pid on the fuse daemon side, because we have pid in struct fuse_in_header structure. And then use it to obtain personality of the caller.
@cjwatson can you try
umount /proc/cpuinfoin the container?
I might have something related. On Raspbian I have a similar issue after switching to the 64bit Kernel. All containers are still 32bit. After the change multiple entries in /proc inside the containers were not updated. I also tested a 64bit container with the same result. What the entries had in common was their size of 4096bytes and not the usual 0byte. I implemented the following one liner to the startup process of every container which is a workaround for me:
/usr/bin/find /proc/ -maxdepth 1 -size 4096c -exec /bin/umount {} \;
I think it might be related to cpuinfo and is not limited to only this entry. But then the Raspbian Kernel is a bit special anyways. I hope someone finds this workaround useful.
@lanmarc77 this issue was fixed already in https://github.com/lxc/lxcfs/pull/567 You just need to update the lxcfs on your machines.