lxd
lxd copied to clipboard
err="strconv.ParseInt: parsing \"\": invalid syntax" when scraping metrics
The following error messages were noticed on different machine all running Ubuntu 20.04 with LXD's snap. The last one occurred with the snap version 4.23 rev 22652 with the following lxc info
:
$ lxc info
config:
core.https_address: 0.0.0.0:8443
core.metrics_address: 0.0.0.0:9101
storage.backups_volume: default/backups
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- rbac
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- clustering_evacuation
- network_ovn_nat_address
- network_bgp
- network_forward
- custom_volume_refresh
- network_counters_errors_dropped
- metrics
- image_source_project
- clustering_config
- network_peer
- linux_sysctl
- network_dns
- ovn_nic_acceleration
- certificate_self_renewal
- instance_project_move
- storage_volume_project_move
- cloud_init
- network_dns_nat
- database_leader
- instance_all_projects
- clustering_groups
- ceph_rbd_du
- instance_get_full
- qemu_metrics
- gpu_mig_uuid
- event_project
- clustering_evacuation_live
- instance_allow_inconsistent_copy
- network_state_ovn
- storage_volume_api_filtering
- image_restrictions
- storage_zfs_export
- network_dns_records
- storage_zfs_reserve_space
- network_acl_log
- storage_zfs_blocksize
- metrics_cpu_seconds
- instance_snapshot_never
- certificate_token
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
addresses:
- 192.168.1.8:8443
architectures:
- x86_64
- i686
certificate: |
-----BEGIN CERTIFICATE-----
[...]
-----END CERTIFICATE-----
certificate_fingerprint: 7be47923cf301ead3a3f0938530aade2486d5cecfd1052959faceb7addb74db2
driver: lxc
driver_version: 4.0.12
firewall: xtables
kernel: Linux
kernel_architecture: x86_64
kernel_features:
netnsid_getifaddrs: "true"
seccomp_listener: "true"
seccomp_listener_continue: "true"
shiftfs: "false"
uevent_injection: "true"
unpriv_fscaps: "true"
kernel_version: 5.13.0-35-generic
lxc_features:
cgroup2: "true"
core_scheduling: "true"
devpts_fd: "true"
idmapped_mounts_v2: "true"
mount_injection_file: "true"
network_gateway_device_route: "true"
network_ipvlan: "true"
network_l2proxy: "true"
network_phys_macvlan_mtu: "true"
network_veth_router: "true"
pidfd: "true"
seccomp_allow_deny_syntax: "true"
seccomp_notify: "true"
seccomp_proxy_send_notify_fd: "true"
os_name: Ubuntu
os_version: "20.04"
project: default
server: lxd
server_clustered: false
server_name: mars.enclume.ca
server_pid: 3684702
server_version: "4.23"
storage: zfs
storage_version: 2.0.6-1ubuntu2
storage_supported_drivers:
- name: ceph
version: 15.2.14
remote: true
- name: btrfs
version: 5.4.1
remote: false
- name: cephfs
version: 15.2.14
remote: true
- name: dir
version: "1"
remote: false
- name: lvm
version: 2.03.07(2) (2019-11-30) / 1.02.167 (2019-11-30) / 4.45.0
remote: false
- name: zfs
version: 2.0.6-1ubuntu2
remote: false
The error messages:
Feb 14 15:11:38 c2d lxd.daemon[1356]: t=2022-02-14T15:11:38+0000 lvl=warn msg="Failed to get total number of processes" err="strconv.ParseInt: parsing \"\": invalid syntax"
Feb 14 21:14:12 ocelot lxd.daemon[1588]: t=2022-02-14T21:14:12+0000 lvl=warn msg="Failed to get swap usage" err="strconv.ParseInt: parsing \"\": invalid syntax"
Feb 22 22:04:52 xeon lxd.daemon[1818]: t=2022-02-22T22:04:52+0000 lvl=warn msg="Failed to get swap usage" err="strconv.ParseInt: parsing \"\": invalid syntax"
Mar 14 09:03:34 mars lxd.daemon[3684702]: t=2022-03-14T09:03:34-0400 lvl=warn msg="Failed to get memory usage" err="strconv.ParseInt: parsing \"\": invalid syntax" instance=vpn instanceType=container project=default
When looking around the last one, there doesn't seem to be anything interesting around the time of the error:
root@mars:~# grep -5 -F 'strconv.ParseInt:' /var/snap/lxd/common/lxd/logs/lxd.log
t=2022-03-14T07:34:43-0400 lvl=info msg="Done pruning expired instance backups"
t=2022-03-14T08:34:43-0400 lvl=info msg="Updating images"
t=2022-03-14T08:34:43-0400 lvl=info msg="Pruning expired instance backups"
t=2022-03-14T08:34:43-0400 lvl=info msg="Done updating images"
t=2022-03-14T08:34:43-0400 lvl=info msg="Done pruning expired instance backups"
t=2022-03-14T09:03:34-0400 lvl=warn msg="Failed to get memory usage" err="strconv.ParseInt: parsing \"\": invalid syntax" instance=vpn instanceType=container project=default
t=2022-03-14T09:34:43-0400 lvl=info msg="Pruning expired instance backups"
t=2022-03-14T09:34:43-0400 lvl=info msg="Updating images"
t=2022-03-14T09:34:43-0400 lvl=info msg="Done pruning expired instance backups"
t=2022-03-14T09:34:43-0400 lvl=info msg="Done updating images"
t=2022-03-14T09:44:45-0400 lvl=info msg="Creating scheduled container snapshots"
@simondeziel once this is in the snap and you start seeing the new errors please can you update it here? Thanks
@tomponline, I'm running logcheck so I get to see all those weird and infrequent errors so yes, I'll report back when I seem them.
@simondeziel do you see any more on these errors now? Thanks
@tomponline no new occurrence since then. I'll close the issue and will reopen it if/when needed.
@tomponline I just got this one:
Apr 26 22:05:04 mars lxd.daemon[339520]: time="2022-04-26T22:05:04-04:00" level=warning msg="Failed to get swap usage" err="Failed parsing \"\": strconv.ParseInt: parsing \"\": invalid syntax" instance=vpn instanceType=container project=default
The host in question runs with:
$ snap list lxd
Name Version Rev Tracking Publisher Notes
lxd 5.0.0-b0287c1 22923 5.0/stable canonical✓ -
Is this occurring when the instance is just starting/stopping/restarting?
No:
$ lxc exec mars:vpn -- uptime
12:45:33 up 7 days, 8:12, 0 users, load average: 0.13, 0.03, 0.01
It just seems random :/
I just go another occurrence but this time the container was being stopped:
May 4 17:04:36 jupiter lxd.daemon[192550]: time="2022-05-04T17:04:36-04:00" level=warning msg="Failed to get total number of processes" err="Failed parsing \"\": strconv.ParseInt: parsing \"\": invalid syntax" instance=ganymede instanceType=container project=default
May 4 17:04:39 jupiter kernel: [1185396.579522] audit: type=1400 audit(1651698279.469:119): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="lxd-ganymede_</var/snap/lxd/common/lxd>" pid=2304952 comm="apparmor_parser"
The above was with:
$ snap list lxd
Name Version Rev Tracking Publisher Notes
lxd 5.0.0-b0287c1 22923 5.0/stable canonical✓ -
I rebooted the hosts c2d
and xeon
yesterday (once each) and got those:
May 5 05:03:56 c2d lxd.daemon[1888]: time="2022-05-05T05:03:56Z" level=warning msg="Failed to get swap usage" err="Failed parsing \"\": strconv.ParseInt: parsing \"\": invalid syntax" instance=weechat instanceType=container project=default
May 5 09:04:22 xeon lxd.daemon[1867]: time="2022-05-05T09:04:22Z" level=warning msg="Failed to get swap usage" err="Failed parsing \"\": strconv.ParseInt: parsing \"\": invalid syntax" instance=log instanceType=container project=default
May 5 10:04:37 xeon lxd.daemon[1867]: time="2022-05-05T10:04:37Z" level=warning msg="Failed to get memory usage" err="Failed parsing \"\": strconv.ParseInt: parsing \"\": invalid syntax" instance=log instanceType=container project=default
Considering the ~1h difference between the 2 errors for the instance=log, I'm not sure there is a direct connection with the reboot.
Do you see the issue only when the instance isn't running?
No, those errors also happen in "steady state", distant from any lifecycle event. Here's another batch from last time:
May 6 09:00:37 xeon lxd.daemon[1867]: time="2022-05-06T09:00:37Z" level=warning msg="Failed to get swap usage" err="Failed parsing \"\": strconv.ParseInt: parsing \"\": invalid syntax" instance=log instanceType=container project=default
May 23 00:48:43 ocelot lxd.daemon[1487]: time="2022-05-23T00:48:43Z" level=warning msg="Failed to get swap usage" err="Failed parsing \"\": strconv.ParseInt: parsing \"\": invalid syntax" instance=pm instanceType=container project=default
May 25 03:00:07 xeon lxd.daemon[2034]: time="2022-05-25T03:00:07Z" level=warning msg="Failed to get memory usage" err="Failed parsing \"\": strconv.ParseInt: parsing \"\": invalid syntax" instance=log instanceType=container project=default
May 31 14:19:40 c2d lxd.daemon[1491]: time="2022-05-31T14:19:40Z" level=warning msg="Failed to get swap usage" err="Failed parsing \"\": strconv.ParseInt: parsing \"\": invalid syntax" instance=rproxy instanceType=container project=default
Jun 19 15:02:07 xeon lxd.daemon[1978]: time="2022-06-19T15:02:07Z" level=warning msg="Failed to get swap usage" err="Failed parsing \"\": strconv.ParseInt: parsing \"\": invalid syntax" instance=log instanceType=container project=default
Jul 3 23:56:40 c2d lxd.daemon[1271]: time="2022-07-03T23:56:40Z" level=warning msg="Failed to get total number of processes" err="Failed parsing \"\": strconv.ParseInt: parsing \"\": invalid syntax" instance=gw-home instanceType=container project=default
Since the beginning, it's always a problem of getting the swap usage
, memory usage
or total processes
value.
And to be clear, the instances are in stopped or started state?
Do you see the issue only when the instance isn't running?
A stopped instance stops being reported in the metrics. And to be clear, those errors did not occur when the instances were stopping.
They are always running, sometimes recently so (shortly after a host reboot) but often for a long while.
I got a bunch of weirder errors:
Jan 9 13:04:23 xeon lxd.daemon[1307]: time="2023-01-09T13:04:23Z" level=warning msg="Failed to get disk stats" err="Failed parsing io.stat (\"8:16 8:0\"): input does not match format" instance=puppet instanceType=container project=default
Jan 9 13:04:23 xeon lxd.daemon[1307]: time="2023-01-09T13:04:23Z" level=warning msg="Failed to get disk stats" err="Failed parsing io.stat (\"8:0 8:16 rbytes=258048 wbytes=0 rios=9 wios=0 dbytes=0 dios=0\"): input does not match format" instance=squid instanceType=container project=default
Jan 9 13:04:23 xeon lxd.daemon[1307]: time="2023-01-09T13:04:23Z" level=warning msg="Failed to get disk stats" err="Failed parsing io.stat (\"8:16 7:2 rbytes=3072 wbytes=0 rios=1 wios=0 dbytes=0 dios=0\"): input does not match format" instance=apt instanceType=container project=default
root@xeon:~# grep . /sys/fs/cgroup/lxc.payload.{apt,puppet,squid}/io.stat
/sys/fs/cgroup/lxc.payload.apt/io.stat:8:16 7:2 rbytes=3072 wbytes=0 rios=1 wios=0 dbytes=0 dios=0
/sys/fs/cgroup/lxc.payload.puppet/io.stat:8:16 8:0
/sys/fs/cgroup/lxc.payload.squid/io.stat:8:0 8:16 rbytes=258048 wbytes=0 rios=9 wios=0 dbytes=0 dios=0
/sys/fs/cgroup/lxc.payload.squid/io.stat:7:1 rbytes=60416 wbytes=0 rios=2 wios=0 dbytes=0 dios=0
It feels like the cgroup data is plain broken and there's nothing LXD can do about it. Sounds like a kernel bug.
Stopping and starting those 3 containers make their io.stat
file empty, same as with other containers not showing any issue.
Yeah, this looks weird. LXD expects a single MAJ:MIN
. In theory, we could handle this by just using the last MAJ:MIN
in the line. However, I don't know how reliable this would be.
I have the same on my machine:
$ cat /sys/fs/cgroup/io.stat
...
8:0 259:0 rbytes=11228777984 wbytes=59497067520 rios=357821 wios=2186721 dbytes=0 dios=0
...
$ lsblk
...
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 1 0B 0 disk
nvme0n1 259:0 0 238.5G 0 disk
...
I believe they should just omit the 8:0
entirely. Or perhaps they just forgot to add a newline.
Yeah, I also have weird stuff like multiple detached loop devices showing up on a single line in the host's io.stat
:
sdeziel@xeon:~$ cat /sys/fs/cgroup/io.stat
8:32 rbytes=999678391296 wbytes=198034948096 rios=3100795 wios=1051435 dbytes=0 dios=0
8:16 rbytes=9083418112 wbytes=72376709632 rios=388623 wios=11650553 dbytes=53315555840 dios=301674
8:0 rbytes=7480011776 wbytes=72376660480 rios=371580 wios=11547682 dbytes=53315555840 dios=301672
7:7 7:6 7:5 7:4 rbytes=28672 wbytes=0 rios=22 wios=0 dbytes=0 dios=0
7:3 rbytes=518539264 wbytes=0 rios=13405 wios=0 dbytes=0 dios=0
7:2 rbytes=1642633216 wbytes=0 rios=36241 wios=0 dbytes=0 dios=0
7:1 rbytes=158687232 wbytes=0 rios=4728 wios=0 dbytes=0 dios=0
7:0 rbytes=25744384 wbytes=0 rios=1091 wios=0 dbytes=0 dios=0
sdeziel@xeon:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
loop0 7:0 0 49.8M 1 loop /snap/snapd/17950
loop1 7:1 0 63.3M 1 loop /snap/core20/1778
loop2 7:2 0 103M 1 loop /snap/lxd/23541
loop3 7:3 0 49.6M 1 loop
sda 8:0 0 232.9G 0 disk
├─sda1 8:1 0 1M 0 part
├─sda2 8:2 0 24G 0 part
├─sda3 8:3 0 2G 0 part [SWAP]
└─sda4 8:4 0 128G 0 part
sdb 8:16 0 232.9G 0 disk
├─sdb1 8:17 0 1M 0 part
├─sdb2 8:18 0 24G 0 part /
├─sdb3 8:19 0 2G 0 part [SWAP]
└─sdb4 8:20 0 128G 0 part
sdc 8:32 0 2.7T 0 disk
├─sdc1 8:33 0 2.7T 0 part
└─sdc9 8:41 0 8M 0 part
sdeziel@xeon:~$ losetup -a
/dev/loop1: []: (/var/lib/snapd/snaps/core20_1778.snap)
/dev/loop2: []: (/var/lib/snapd/snaps/lxd_23541.snap)
/dev/loop0: []: (/var/lib/snapd/snaps/snapd_17950.snap)
/dev/loop3: []: (/var/lib/snapd/snaps/snapd_17883.snap (deleted))
Another weird thing is inside most of my containers, that io.stat
file is completely empty but not always. It even changes upon container restarts.
@mihalicyn is this some area of the kernel you know by any chance?
Hm, yep, it looks a little bit broken since this commit https://lore.kernel.org/all/[email protected]/
And users noticed this: https://lore.kernel.org/all/[email protected]/
From the kernel code, if follows, that's it's fully safe just to take the last device MAJ:MIN
from the line.
It's already fixed by another patch https://github.com/torvalds/linux/commit/3607849df47822151b05df440759e2dc70160755
which allows output like this:
253:10
253:5 rbytes=0 wbytes=0 rios=0 wios=1 dbytes=0 dios=0
instead of
253:10 253:5 rbytes=0 wbytes=0 rios=0 wios=1 dbytes=0 dios=0
I think we can try to handle all these options :-)
@mihalicyn that really pleases me that you've found it to be fixed upstream, many thanks! I'll check if Canonical kernels that are currently in -proposed have the patch but if not, I'll ask for a backport/inclusion.
Thanks for looking into this!!
@mihalicyn that really pleases me that you've found it to be fixed upstream, many thanks! I'll check if Canonical kernels that are currently in -proposed have the patch but if not, I'll ask for a backport/inclusion.
Thanks for looking into this!!
Always glad to help! ;-)
@mihalicyn, https://github.com/torvalds/linux/commit/3607849df47822151b05df440759e2dc70160755 wasn't CC'ed to [email protected] and I couldn't find it in upstream's 5.15 changelogs so apparently nobody picked it up. I think it'd be best to send it to stable@ for upstream inclusion rather than fixing it in Canonical kernels only. What do you think?
cc'ing @Blub author of https://github.com/torvalds/linux/commit/3607849df47822151b05df440759e2dc70160755
Yep, I think it's worth adding to -stable kernels. But I'm afraid that we'll need to have some workaround anyway because this process of taking the patch to stable and then waiting for it to be picked up downstream is not fast. BTW, my patch for shifts is still not landed on Ubuntu kernels, but I've done it almost 2 months ago :D
Also see here https://discuss.linuxcontainers.org/t/lxc-query-not-showing-disk-stats-for-all-containers/16440
@gabrielmougard I just checked my logs for the year and here's what I got:
root@log:~# grep -hF "lxd" /var/log/archives/2023/2023-*-syslog | grep -vF data/ | sed 's/.*level=//' | grep -F ' msg="Failed to get ' | sort | uniq -c | sort -nr
5655 warning msg="Failed to get disk stats" err="Failed parsing io.stat (\"8:0\"): unexpected EOF" instance=apt instanceType=container project=default
4186 warning msg="Failed to get disk stats" err="Failed parsing io.stat (\"8:16 7:1 rbytes=55296 wbytes=0 rios=1 wios=0 dbytes=0 dios=0\"): input does not match format" instance=log instanceType=container project=default
3236 warning msg="Failed to get disk stats" err="Failed parsing io.stat (\"8:0 8:16 rbytes=258048 wbytes=0 rios=9 wios=0 dbytes=0 dios=0\"): input does not match format" instance=squid instanceType=container project=default
3212 warning msg="Failed to get disk stats" err="Failed parsing io.stat (\"8:16\"): unexpected EOF" instance=metrics instanceType=container project=default
1996 warning msg="Failed to get disk stats" err="Failed extracting io.stat \"\" (from \"8:0\")" instance=metrics instanceType=container project=default
1889 warning msg="Failed to get disk stats" err="Failed parsing io.stat (\"8:16 8:0\"): input does not match format" instance=puppet instanceType=container project=default
1759 warning msg="Failed to get disk stats" err="Failed parsing io.stat (\"8:16 7:2 rbytes=3072 wbytes=0 rios=1 wios=0 dbytes=0 dios=0\"): input does not match format" instance=apt instanceType=container project=default
1467 warning msg="Failed to get disk stats" err="Failed parsing io.stat (\"8:16\"): unexpected EOF" instance=log instanceType=container project=default
1439 warning msg="Failed to get disk stats" err="Failed parsing io.stat (\"8:0\"): unexpected EOF" instance=puppet instanceType=container project=default
1431 warning msg="Failed to get disk stats" err="Failed parsing io.stat (\"8:0 8:16\"): input does not match format" instance=metrics instanceType=container project=default
553 warning msg="Failed to get disk stats" err="Failed parsing io.stat (\"8:0 8:16 rbytes=8192 wbytes=0 rios=2 wios=0 dbytes=0 dios=0\"): input does not match format" instance=apt instanceType=container project=default
482 warning msg="Failed to get disk stats" err="Failed extracting io.stat \"8:16\" (from \"8:0 8:16 rbytes=258048 wbytes=0 rios=9 wios=0 dbytes=0 dios=0\")" instance=metrics instanceType=container project=default
269 warning msg="Failed to get disk stats" err="Failed parsing io.stat (\"8:0\"): unexpected EOF" instance=log instanceType=container project=default
2 warning msg="Failed to get total number of processes" err="Failed parsing \"\": strconv.ParseInt: parsing \"\": invalid syntax" instance=gw-home instanceType=container project=default
1 warning msg="Failed to get total number of processes" err="Failed parsing \"\": strconv.ParseInt: parsing \"\": invalid syntax" instance=git instanceType=container project=default
1 warning msg="Failed to get memory usage" err="Failed parsing \"\": strconv.ParseInt: parsing \"\": invalid syntax" instance=squid instanceType=container project=default
1 warning msg="Failed to get memory usage" err="Failed parsing \"\": strconv.ParseInt: parsing \"\": invalid syntax" instance=gw-home instanceType=container project=default
The good news is that those newer messages include the content of the file that couldn't be parse successfully :)
So the bulk of it relates to the io.stat
kernel issue you are trying to workaround but there are some other failures around process count and memory usage too.
My environment is 22.04 with HWE kernel.