zfs icon indicating copy to clipboard operation
zfs copied to clipboard

System crash/hangup on syspend while scrub is running

Open omgold opened this issue 4 years ago • 1 comments

System information

Type Version/Name
Distribution Name Arch
Kernel Version 5.12.15-arch1
Architecture x86_64
OpenZFS Version 2.1.0

Describe the problem you're observing

When scrub is running in the background on a zpool, and put the PC to sleep (suspend via systemd) the shutdown does not complete and it is impossible to wake it up afterwards.

The PC is unresponsive in this state but still powered on. Display is powered off and remote access is not working, so no idea where it hangs exactly.

Describe how to reproduce the problem

  • start scrub
  • do systemctl suspend

Include any warning/errors/backtraces from the system logs

The only possible hint I get that after boot (and also after wakeup from sleep when scrub is not running) are messages like this:

[ 1513.933139] CPU: 11 PID: 90 Comm: cpuhp/11 Tainted: P        W  OE     5.12.15-arch1-1 #1

[ 1513.933140] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z370 Killer SLI, BIOS P1.60 03/21/2018
[ 1513.933141] Call Trace:
[ 1513.933144]  dump_stack+0x76/0x94
[ 1513.933147]  __schedule_bug.cold+0x4c/0x58
[ 1513.933149]  __schedule+0x6bf/0x8b0
[ 1513.933152]  ? taskq_thread_spawn+0x50/0x50 [spl]
[ 1513.933158]  schedule+0x5b/0xc0
[ 1513.933159]  schedule_timeout+0x11c/0x160
[ 1513.933161]  wait_for_completion_killable+0xc7/0x160
[ 1513.933163]  __kthread_create_on_node+0xf8/0x1b0
[ 1513.933166]  ? taskq_thread_spawn+0x50/0x50 [spl]
[ 1513.933170]  kthread_create_on_node+0x51/0x70
[ 1513.933172]  ? taskq_thread_spawn+0x50/0x50 [spl]
[ 1513.933176]  spl_kthread_create+0xa2/0x100 [spl]
[ 1513.933181]  taskq_thread_create+0x66/0xf0 [spl]
[ 1513.933185]  spl_taskq_expand+0xb1/0xc0 [spl]
[ 1513.933189]  cpuhp_invoke_callback+0x1c6/0x480
[ 1513.933191]  ? taskq_thread_create+0xf0/0xf0 [spl]
[ 1513.933195]  cpuhp_thread_fun+0xb0/0x110
[ 1513.933196]  smpboot_thread_fn+0xee/0x1e0
[ 1513.933198]  ? smpboot_register_percpu_thread+0xf0/0xf0
[ 1513.933199]  kthread+0x133/0x150
[ 1513.933201]  ? kthread_associate_blkcg+0xc0/0xc0
[ 1513.933202]  ret_from_fork+0x22/0x30
[ 1513.933310] BUG: scheduling while atomic: cpuhp/11/90/0x00000000
[ 1513.933311] CPU11 is up
[ 1513.933317] Modules linked in: nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 cmac 
algif_hash algif_skcipher af_alg nf_tables bnep zram nfnetlink nct6775 hwmon_vid lm92 zfs(POE) intel_rapl_msr intel_rapl_common ext4 mbcache jbd2 
f2fs zunicode(POE) nls_iso8859_1 zzstd(OE) vfat fat nvidia_drm(POE) uas usb_storage nvidia_modeset(POE) raid1 loop zlua(OE) nvidia_uvm(POE) zavl(P
OE) iTCO_wdt intel_pmc_bxt iTCO_vendor_support ee1004 icp(POE) mousedev mei_hdcp uvcvideo intel_wmi_thunderbolt mxm_wmi videobuf2_vmalloc videobuf
2_memops snd_usb_audio videobuf2_v4l2 btusb videobuf2_common btrtl snd_usbmidi_lib videodev btbcm btintel snd_rawmidi snd_seq_device nvidia(POE) m
c zcommon(POE) bluetooth md_mod znvpair(POE) ecdh_generic rfkill spl(OE) x86_pkg_temp_thermal intel_powerclamp ecc coretemp crc16 kvm_intel snd_hd
a_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio snd_hda_intel snd_intel_dspcfg kvm snd_intel_sdw_acpi
[ 1513.933344]  irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec aesni_intel crypto_simd snd_hda_core cryptd rapl intel_c
state i2c_i801 snd_hwdep intel_uncore e1000e pcspkr drm_kms_helper i2c_smbus snd_pcm cec snd_timer snd syscopyarea mei_me sysfillrect sysimgblt jo
ydev mei usblp soundcore fb_sys_fops wmi video mac_hid acpi_pad nfsd dm_crypt cbc encrypted_keys trusted tpm auth_rpcgss rng_core udf nfs_acl lockd drm crc_itu_t isofs grace sg crypto_user fuse sunrpc agpgart nfs_ssc ip_tables x_tables usbhid xfs dm_cache_smq dm_cache dm_persistent_data dm_bio_prison dm_bufio libcrc32c crc32c_generic dm_mod crc32c_intel sr_mod cdrom xhci_pci xhci_pci_renesasExample:

omgold avatar Jul 14 '21 05:07 omgold

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Aug 10 '22 04:08 stale[bot]