Page dump in abd_free_chunks "page still charged to cgroup"
System information
| Type | Version/Name |
|---|---|
| Distribution Name | ubuntu |
| Distribution Version | 24.04 |
| Kernel Version | 6.8.0-41-generic |
| Architecture | intel x86-64 |
| OpenZFS Version | zfs-2.2.2-0ubuntu9, zfs-kmod-2.2.2-0ubuntu9 |
zpool status:
pool: mediavol
state: ONLINE
status: Some supported and requested features are not enabled on the pool.
The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: resilvered 12.2G in 00:05:34 with 0 errors on Tue Jul 2 20:50:05 2024
config:
NAME STATE READ WRITE CKSUM
mediavol ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
sdq3 ONLINE 0 0 0
sde3 ONLINE 0 0 0
sda2 ONLINE 0 0 0
sdh2 ONLINE 0 0 0
sdn2 ONLINE 0 0 0
sdb1 ONLINE 0 0 0
sdc2 ONLINE 0 0 0
sdj10 ONLINE 0 0 0
sdg1 ONLINE 0 0 0
sdo2 ONLINE 0 0 0
raidz2-1 ONLINE 0 0 0
sda4 ONLINE 0 0 0
sdh3 ONLINE 0 0 0
sdi2 ONLINE 0 0 0
sdl3 ONLINE 0 0 0
sdk1 ONLINE 0 0 0
sde1 ONLINE 0 0 0
sdq1 ONLINE 0 0 0
sdn1 ONLINE 0 0 0
sdb5 ONLINE 0 0 0
sdc3 ONLINE 0 0 0
raidz2-2 ONLINE 0 0 0
sdh1 ONLINE 0 0 0
sda1 ONLINE 0 0 0
sdc1 ONLINE 0 0 0
sda3 ONLINE 0 0 0
sdb2 ONLINE 0 0 0
sdi1 ONLINE 0 0 0
sdl1 ONLINE 0 0 0
sde2 ONLINE 0 0 0
sdq2 ONLINE 0 0 0
sdn3 ONLINE 0 0 0
raidz2-3 ONLINE 0 0 0
sdn4 ONLINE 0 0 0
sdc4 ONLINE 0 0 0
sdo1 ONLINE 0 0 0
sdb3 ONLINE 0 0 0
sdm1 ONLINE 0 0 0
sdi3 ONLINE 0 0 0
sdq4 ONLINE 0 0 0
sdl6 ONLINE 0 0 0
sdk2 ONLINE 0 0 0
sdf1 ONLINE 0 0 0
raidz2-4 ONLINE 0 0 0
sdn5 ONLINE 0 0 0
sdb4 ONLINE 0 0 0
sdh4 ONLINE 0 0 0
sdf2 ONLINE 0 0 0
sdc5 ONLINE 0 0 0
sdo3 ONLINE 0 0 0
sde4 ONLINE 0 0 0
sdl7 ONLINE 0 0 0
sdm2 ONLINE 0 0 0
sdj2 ONLINE 0 0 0
raidz2-5 ONLINE 0 0 0
sdn6 ONLINE 0 0 0
sdd4 ONLINE 0 0 0
sdc6 ONLINE 0 0 0
sdf3 ONLINE 0 0 0
sdb6 ONLINE 0 0 0
sdp4 ONLINE 0 0 0
sdo4 ONLINE 0 0 0
sdm3 ONLINE 0 0 0
sdl8 ONLINE 0 0 0
sdi4 ONLINE 0 0 0
special
mirror-6 ONLINE 0 0 0
nvme1n1p1 ONLINE 0 0 0
nvme2n1p1 ONLINE 0 0 0
nvme3n1p1 ONLINE 0 0 0
cache
sds ONLINE 0 0 0
sdr ONLINE 0 0 0
errors: No known data errors
pool: metavol
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
metavol ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
nvme2n1p2 ONLINE 0 0 0
nvme3n1p2 ONLINE 0 0 0
nvme1n1p2 ONLINE 0 0 0
errors: No known data errors
Pool setup is a bit weird. I have 17 drives from 3-10TB with a script that allocates 8TB zvols by grabbing 10 1TB partitions from 10 different disks. cache ssds are in the process of being replaced by nvme special zvol, metavol is mounted to /home as overlay.
Describe the problem you're observing
zfs crash causes docker containers to shutdown.
Describe how to reproduce the problem
I'm not sure what triggers this, it happened after letting the machine run for about 12 hours. Docker containers can read/write from a zfs volume containing media and config files.
Include any warning/errors/backtraces from the system logs
dmesg output:
[32455.071711] BUG: Bad page state in process dp_sync_taskq pfn:850d3b
[32455.071719] page:0000000023b7b127 refcount:117 mapcount:0 mapping:0000000000000000 index:0x640000009e pfn:0x850d3b
[32455.071723] memcg:ab000000ff
[32455.071724] anon flags: 0x17ff44c00000a5(locked|waiters|referenced|lru|node=0|zone=2|lastcpupid=0x1ffd13)
[32455.071727] page_type: 0xffffff1b(buddy|0x64)
[32455.071730] raw: 0017ff44c00000a5 ffffeea2614348d4 dead008e00000124 dead00070000045d
[32455.071731] raw: 000000640000009e 000000b600000003 00000075ffffff1b 000000ab000000ff
[32455.071732] page dumped because: corrupted mapping in tail page
[32455.071733] Modules linked in: xt_nat xt_tcpudp veth xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user
xfrm_algo xt_addrtype nft_compat nf_tables br_netfilter bridge stp llc rfcomm snd_seq_dummy snd_hrtimer overlay qrtr cmac algif_hash algif_skcipher af_alg bnep binfmt_misc mei_gs
c xe drm_gpuvm drm_exec gpu_sched drm_suballoc_helper drm_ttm_helper intel_uncore_frequency intel_uncore_frequency_common x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_hdmi
snd_hda_codec_realtek snd_hda_codec_generic coretemp snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel kvm_intel snd_sof_intel_hda_mlink soundwire_cadence snd_sof_i
ntel_hda snd_sof_pci snd_sof_xtensa_dsp kvm snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match irqbypass snd_soc_acpi crct10dif_pclmul soundwire_gen
eric_allocation polyval_clmulni polyval_generic soundwire_bus ghash_clmulni_intel rtw89_8852ce sha256_ssse3 rtw89_8852c sha1_ssse3 snd_soc_core
[32455.071772] aesni_intel rtw89_pci crypto_simd cryptd snd_compress ac97_bus snd_pcm_dmaengine rtw89_core cmdlinepart snd_hda_intel spi_nor snd_intel_dspcfg rapl snd_intel_sdw_
acpi mtd intel_cstate intel_rapl_msr mei_hdcp mei_pxp mac80211 snd_hda_codec snd_seq_midi snd_seq_midi_event i915 snd_hda_core snd_rawmidi snd_hwdep gigabyte_wmi wmi_bmof i2c_i80
1 btusb spi_intel_pci btrtl snd_pcm cfg80211 spi_intel i2c_smbus btintel snd_seq drm_buddy btbcm btmtk processor_thermal_device_pci ttm libarc4 snd_seq_device processor_thermal_d
evice bluetooth processor_thermal_wt_hint drm_display_helper snd_timer processor_thermal_rfim cec processor_thermal_rapl snd ecdh_generic intel_rapl_common ecc processor_thermal_
wt_req rc_core processor_thermal_power_floor soundcore i2c_algo_bit processor_thermal_mbox int340x_thermal_zone input_leds joydev intel_pmc_core intel_vsec int3400_thermal intel_
hid pmt_telemetry sparse_keymap acpi_thermal_rel pmt_class acpi_tad acpi_pad mei_me mei mac_hid nls_iso8859_1 zfs(PO) spl(O) msr parport_pc ppdev
[32455.071804] lp parport efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c hid_logitech_hidpp hid_generic usbhid hid nvme m
pt3sas crc32_pclmul raid_class igc nvme_core ahci intel_lpss_pci scsi_transport_sas intel_lpss libahci xhci_pci nvme_auth idma64 xhci_pci_renesas video wmi pinctrl_alderlake
[32455.071818] CPU: 11 PID: 6215 Comm: dp_sync_taskq Kdump: loaded Tainted: P O 6.8.0-41-generic #41-Ubuntu
[32455.071820] Hardware name: Gigabyte Technology Co., Ltd. Z790 AORUS ELITE X AX/Z790 AORUS ELITE X AX, BIOS F5 05/08/2024
[32455.071821] Call Trace:
[32455.071822] <TASK>
[32455.071824] dump_stack_lvl+0x76/0xa0
[32455.071828] dump_stack+0x10/0x20
[32455.071829] bad_page+0x76/0x120
[32455.071832] free_tail_page_prepare+0x68/0x160
[32455.071834] __free_pages_ok+0x39e/0x440
[32455.071835] __free_pages+0x100/0x140
[32455.071837] ? put_page+0x21/0x30 [zfs]
[32455.071997] abd_free_chunks+0x68/0x120 [zfs]
[32455.072130] abd_free_linear_page+0x23/0x40 [zfs]
[32455.072263] abd_free_linear+0x53/0x70 [zfs]
[32455.072410] abd_free+0x99/0xb0 [zfs]
[32455.072555] arc_free_data_abd+0x20/0x40 [zfs]
[32455.072700] arc_hdr_free_abd+0xd3/0x150 [zfs]
[32455.072846] arc_hdr_destroy+0x106/0x190 [zfs]
[32455.072990] arc_freed+0x6b/0xc0 [zfs]
[32455.073137] zio_free_sync+0x53/0x130 [zfs]
[32455.073284] zio_free+0xd3/0x110 [zfs]
[32455.073429] dsl_free+0x11/0x20 [zfs]
[32455.073594] dsl_dataset_block_kill+0x350/0x540 [zfs]
[32455.073757] free_blocks+0xe5/0x1d0 [zfs]
[32455.073917] free_children+0x3fe/0x470 [zfs]
[32455.074079] ? dbuf_rele_and_unlock+0x1ae/0x3c0 [zfs]
[32455.074232] free_children+0x247/0x470 [zfs]
[32455.074391] dnode_sync_free_range_impl+0x106/0x260 [zfs]
[32455.074546] dnode_sync_free_range+0x67/0xa0 [zfs]
[32455.074700] ? __pfx_dnode_sync_free_range+0x10/0x10 [zfs]
[32455.074854] range_tree_walk+0x82/0xd0 [zfs]
[32455.075026] dnode_sync+0x2e8/0x640 [zfs]
[32455.075183] dmu_objset_sync_dnodes+0x65/0x90 [zfs]
[32455.075338] sync_dnodes_task+0x29/0x50 [zfs]
[32455.075489] taskq_thread+0x1f3/0x3c0 [spl]
[32455.075501] ? __pfx_default_wake_function+0x10/0x10
[32455.075505] ? __pfx_taskq_thread+0x10/0x10 [spl]
[32455.075514] kthread+0xef/0x120
[32455.075516] ? __pfx_kthread+0x10/0x10
[32455.075518] ret_from_fork+0x44/0x70
[32455.075520] ? __pfx_kthread+0x10/0x10
[32455.075521] ret_from_fork_asm+0x1b/0x30
[32455.075523] </TASK>
[32455.075529] BUG: Bad page state in process dp_sync_taskq pfn:850d3b
[32455.075530] page:0000000023b7b127 refcount:117 mapcount:0 mapping:0000000000000000 index:0x640000009e pfn:0x850d3b
[32455.075532] memcg:ab000000ff
[32455.075533] flags: 0x17ff44c00000a5(locked|waiters|referenced|lru|node=0|zone=2|lastcpupid=0x1ffd13)
[32455.075534] page_type: 0xffffffff()
[32455.075535] raw: 0017ff44c00000a5 0000000000000000 dead008e00000124 0000000000000000
[32455.075536] raw: 000000640000009e 000000b600000003 00000075ffffffff 000000ab000000ff
[32455.075537] page dumped because: page still charged to cgroup
[32455.075537] Modules linked in: xt_nat xt_tcpudp veth xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables br_netfilter bridge stp llc rfcomm snd_seq_dummy snd_hrtimer overlay qrtr cmac algif_hash algif_skcipher af_alg bnep binfmt_misc mei_gsc xe drm_gpuvm drm_exec gpu_sched drm_suballoc_helper drm_ttm_helper intel_uncore_frequency intel_uncore_frequency_common x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic coretemp snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel kvm_intel snd_sof_intel_hda_mlink soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp kvm snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match irqbypass snd_soc_acpi crct10dif_pclmul soundwire_generic_allocation polyval_clmulni polyval_generic soundwire_bus ghash_clmulni_intel rtw89_8852ce sha256_ssse3 rtw89_8852c sha1_ssse3 snd_soc_core
[32455.075564] aesni_intel rtw89_pci crypto_simd cryptd snd_compress ac97_bus snd_pcm_dmaengine rtw89_core cmdlinepart snd_hda_intel spi_nor snd_intel_dspcfg rapl snd_intel_sdw_acpi mtd intel_cstate intel_rapl_msr mei_hdcp mei_pxp mac80211 snd_hda_codec snd_seq_midi snd_seq_midi_event i915 snd_hda_core snd_rawmidi snd_hwdep gigabyte_wmi wmi_bmof i2c_i801 btusb spi_intel_pci btrtl snd_pcm cfg80211 spi_intel i2c_smbus btintel snd_seq drm_buddy btbcm btmtk processor_thermal_device_pci ttm libarc4 snd_seq_device processor_thermal_device bluetooth processor_thermal_wt_hint drm_display_helper snd_timer processor_thermal_rfim cec processor_thermal_rapl snd ecdh_generic intel_rapl_common ecc processor_thermal_wt_req rc_core processor_thermal_power_floor soundcore i2c_algo_bit processor_thermal_mbox int340x_thermal_zone input_leds joydev intel_pmc_core intel_vsec int3400_thermal intel_hid pmt_telemetry sparse_keymap acpi_thermal_rel pmt_class acpi_tad acpi_pad mei_me mei mac_hid nls_iso8859_1 zfs(PO) spl(O) msr parport_pc ppdev
[32455.075595] lp parport efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c hid_logitech_hidpp hid_generic usbhid hid nvme mpt3sas crc32_pclmul raid_class igc nvme_core ahci intel_lpss_pci scsi_transport_sas intel_lpss libahci xhci_pci nvme_auth idma64 xhci_pci_renesas video wmi pinctrl_alderlake
[32455.075607] CPU: 11 PID: 6215 Comm: dp_sync_taskq Kdump: loaded Tainted: P B O 6.8.0-41-generic #41-Ubuntu
[32455.075609] Hardware name: Gigabyte Technology Co., Ltd. Z790 AORUS ELITE X AX/Z790 AORUS ELITE X AX, BIOS F5 05/08/2024
[32455.075609] Call Trace:
[32455.075610] <TASK>
[32455.075610] dump_stack_lvl+0x76/0xa0
[32455.075613] dump_stack+0x10/0x20
[32455.075614] bad_page+0x76/0x120
[32455.075616] free_page_is_bad_report+0x86/0xa0
[32455.075618] __free_pages_ok+0x3fe/0x440
[32455.075620] __free_pages+0x100/0x140
[32455.075621] ? put_page+0x21/0x30 [zfs]
[32455.075769] abd_free_chunks+0x68/0x120 [zfs]
[32455.075906] abd_free_linear_page+0x23/0x40 [zfs]
[32455.076037] abd_free_linear+0x53/0x70 [zfs]
[32455.076156] abd_free+0x99/0xb0 [zfs]
[32455.076290] arc_free_data_abd+0x20/0x40 [zfs]
[32455.076437] arc_hdr_free_abd+0xd3/0x150 [zfs]
[32455.076584] arc_hdr_destroy+0x106/0x190 [zfs]
[32455.076729] arc_freed+0x6b/0xc0 [zfs]
[32455.076874] zio_free_sync+0x53/0x130 [zfs]
[32455.077026] zio_free+0xd3/0x110 [zfs]
[32455.077216] dsl_free+0x11/0x20 [zfs]
[32455.077406] dsl_dataset_block_kill+0x350/0x540 [zfs]
[32455.077570] free_blocks+0xe5/0x1d0 [zfs]
[32455.077730] free_children+0x3fe/0x470 [zfs]
[32455.077886] ? dbuf_rele_and_unlock+0x1ae/0x3c0 [zfs]
[32455.078039] free_children+0x247/0x470 [zfs]
[32455.078196] dnode_sync_free_range_impl+0x106/0x260 [zfs]
[32455.078350] dnode_sync_free_range+0x67/0xa0 [zfs]
[32455.078496] ? __pfx_dnode_sync_free_range+0x10/0x10 [zfs]
[32455.078643] range_tree_walk+0x82/0xd0 [zfs]
[32455.078800] dnode_sync+0x2e8/0x640 [zfs]
[32455.078949] dmu_objset_sync_dnodes+0x65/0x90 [zfs]
[32455.079097] sync_dnodes_task+0x29/0x50 [zfs]
[32455.079252] taskq_thread+0x1f3/0x3c0 [spl]
[32455.079262] ? __pfx_default_wake_function+0x10/0x10
[32455.079266] ? __pfx_taskq_thread+0x10/0x10 [spl]
[32455.079275] kthread+0xef/0x120
[32455.079277] ? __pfx_kthread+0x10/0x10
[32455.079279] ret_from_fork+0x44/0x70
[32455.079280] ? __pfx_kthread+0x10/0x10
[32455.079281] ret_from_fork_asm+0x1b/0x30
[32455.079284] </TASK>
You should probably report this to Ubuntu or try it on 2.2.5 vanilla, 2.2.2 doesn't claim to work on 6.8.
CAUTION: DO NOT DOWNLOAD OR RUN ANYTHING from@amir1387aht 's comment. It is not related to this issue (of course, Windows executables) and is most probably malware.