zfs icon indicating copy to clipboard operation
zfs copied to clipboard

ZFS hangs on kernel error: VERIFY3(0 == dmu_bonus_hold_by_dnode

Open vicsn opened this issue 4 years ago • 13 comments

System information

Type Version/Name
Distribution Name Debian
Distribution Version Buster
Linux Kernel 5.10.0-0.bpo.4-amd64 / 4.19.0-16
Architecture amd64
ZFS Version 2.0.3-1~bpo10+1
SPL Version 2.0.3-1~bpo10+1

Zpool properties:

              -o ashift=12
              -O checksum=sha256
              -O compression=lz4
              -O atime=off
              -O normalization=formD
              -O xattr=sa
              -O relatime=on

Example test machine properties:

  • CPU model name : AMD Ryzen Threadripper 2990WX 32-Core Processor
  • 6 HDDs of model: Gigabyte Technology Co., Ltd. X399 AORUS PRO/X399 AORUS PRO-CF, BIOS F2g 05/08/2019

Describe the problem you're observing

During an incremental send using zfs send -I | receive -F, ZFS hangs. Any subsequent call to ZFS or e.g. opening any file on the ZFS filesystem, will hang indefinitely.

Note that we also experienced: https://github.com/openzfs/zfs/issues/12014 - perhaps it is related.

Describe how to reproduce the problem

This is happening frequently for us, on multiple machines, all using HDDs. The machines act as backup targets so most activity is receiving snapshots all day. Other zfs related processes on the machines are: snapshots are taken and pruned multiple times per day and a scrub happens once a week. We use encrypted filesystems and a single zpool. Historically the error has triggered for us between 1 and 48 hours.

Total data usage is 2.12T, with zpool iostat output showing:

              capacity     operations     bandwidth 
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
data        4.37T  10.2T     17    172   967K  5.46M

Current mitigation

Downgrading to zfs version 0.8.4-1~bpo10+1 seems to have resolved the issue - at the time of writing we had about a ~week~ month without ZFS hanging.

Include any warning/errors/backtraces from the system logs

The first error

|May  4 17:11:31 <host> kernel: [96617.381743] VERIFY3(0 == dmu_bonus_hold_by_dnode(dn, FTAG, &db, flags)) failed (0 == 5)                               
│May  4 17:11:31 <host> kernel: [96617.381882] PANIC at dmu_recv.c:1811:receive_object()                                                                
│May  4 17:11:31 <host> kernel: [96617.381990] Showing stack for process 97766                                                                          
│May  4 17:11:31 <host> kernel: [96617.382002] CPU: 55 PID: 97766 Comm: receive_writer Tainted: P           OE     4.19.0-16-amd64 #1 Debian 4.19.181-1 
│May  4 17:11:31 <host> kernel: [96617.382006] Hardware name: Gigabyte Technology Co., Ltd. X399 AORUS PRO/X399 AORUS PRO-CF, BIOS F2g 05/08/2019       
│May  4 17:11:31 <host> kernel: [96617.382008] Call Trace:                                                                                              
│May  4 17:11:31 <host> kernel: [96617.382058]  dump_stack+0x66/0x81                                                                                    
│May  4 17:11:31 <host> kernel: [96617.382098]  spl_panic+0xd3/0xfb [spl]                                                                               
│May  4 17:11:31 <host> kernel: [96617.382454]  ? dbuf_rele_and_unlock+0x30f/0x660 [zfs]                                                                
│May  4 17:11:31 <host> kernel: [96617.382562]  ? dbuf_read+0x1ca/0x520 [zfs]                                                                           
│May  4 17:11:31 <host> kernel: [96617.382642]  ? dnode_hold_impl+0x350/0xc20 [zfs]                                                                     
│May  4 17:11:31 <host> kernel: [96617.382652]  ? spl_kmem_cache_free+0xec/0x1c0 [spl]                                                                  
│May  4 17:11:31 <host> kernel: [96617.382728]  receive_object+0x923/0xc70 [zfs]                                                                        
│May  4 17:11:31 <host> kernel: [96617.382803]  receive_writer_thread+0x279/0xa00 [zfs]                                                                 
│May  4 17:11:31 <host> kernel: [96617.382817]  ? set_curr_task_fair+0x26/0x50                                                                          
│May  4 17:11:31 <host> kernel: [96617.382894]  ? receive_process_write_record+0x190/0x190 [zfs]                                                        
│May  4 17:11:31 <host> kernel: [96617.382901]  ? __thread_exit+0x20/0x20 [spl]                                                                         
│May  4 17:11:31 <host> kernel: [96617.382905]  ? thread_generic_wrapper+0x6f/0x80 [spl]                                                                
│May  4 17:11:31 <host> kernel: [96617.382910]  thread_generic_wrapper+0x6f/0x80 [spl]                                                                  
│May  4 17:11:31 <host> kernel: [96617.382920]  kthread+0x112/0x130                                                                                     
│May  4 17:11:31 <host> kernel: [96617.382926]  ? kthread_bind+0x30/0x30                                                                                
│May  4 17:11:31 <host> kernel: [96617.382935]  ret_from_fork+0x22/0x40                

Subsequent errors

│May 4 17:16:05 <host> kernel: [96890.895937] hrtimer: interrupt took 74060 ns
│May 4 17:16:20 <host> kernel: [96906.140617] INFO: task txg_quiesce:1694 blocked for more than 120 seconds.
│May 4 17:16:20 <host> kernel: [96906.140714] Tainted: P OE 4.19.0-16-amd64 #1 Debian 4.19.181-1
│May 4 17:16:20 <host> kernel: [96906.140799] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
│May 4 17:16:20 <host> kernel: [96906.140884] txg_quiesce D 0 1694 2 0x80000000
│May 4 17:16:20 <host> kernel: [96906.140889] Call Trace:
│May 4 17:16:20 <host> kernel: [96906.140903] __schedule+0x29f/0x840
│May 4 17:16:20 <host> kernel: [96906.140908] schedule+0x28/0x80
│May 4 17:16:20 <host> kernel: [96906.140924] cv_wait_common+0xfb/0x130 [spl]
│May 4 17:16:20 <host> kernel: [96906.140935] ? finish_wait+0x80/0x80
│May 4 17:16:20 <host> kernel: [96906.141077] txg_quiesce_thread+0x2a9/0x3a0 [zfs]
│May 4 17:16:20 <host> kernel: [96906.141169] ? txg_sync_thread+0x480/0x480 [zfs]
│May 4 17:16:20 <host> kernel: [96906.141176] ? __thread_exit+0x20/0x20 [spl]
│May 4 17:16:20 <host> kernel: [96906.141182] thread_generic_wrapper+0x6f/0x80 [spl]
│May 4 17:16:20 <host> kernel: [96906.141187] kthread+0x112/0x130
│May 4 17:16:20 <host> kernel: [96906.141190] ? kthread_bind+0x30/0x30
│May 4 17:16:20 <host> kernel: [96906.141193] ret_from_fork+0x22/0x40
│May 4 17:16:20 <host> kernel: [96906.141783] INFO: task receive_writer:97766 blocked for more than 120 seconds. ┤
│May 4 17:16:20 <host> kernel: [96906.141865] Tainted: P OE 4.19.0-16-amd64 #1 Debian 4.19.181-1 ┤
│May 4 17:16:20 <host> kernel: [96906.141963] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ┤
│May 4 17:16:20 <host> kernel: [96906.142065] receive_writer D 0 97766 2 0x80000000 ┤
│May 4 17:16:20 <host> kernel: [96906.142068] Call Trace: ┤
│May 4 17:16:20 <host> kernel: [96906.142074] __schedule+0x29f/0x840 ┤
│May 4 17:16:20 <host> kernel: [96906.142078] ? ret_from_fork+0x21/0x40 ┤
│May 4 17:16:20 <host> kernel: [96906.142083] schedule+0x28/0x80 ┤
│May 4 17:16:20 <host> kernel: [96906.142091] spl_panic+0xf9/0xfb [spl] ┤
│May 4 17:16:20 <host> kernel: [96906.142192] ? dbuf_rele_and_unlock+0x30f/0x660 [zfs] ┤
│May 4 17:16:20 <host> kernel: [96906.142239] ? dbuf_read+0x1ca/0x520 [zfs] │
│May 4 17:16:20 <host> kernel: [96906.142289] ? dnode_hold_impl+0x350/0xc20 [zfs] ┤
│May 4 17:16:20 <host> kernel: [96906.142295] ? spl_kmem_cache_free+0xec/0x1c0 [spl] ┤
│May 4 17:16:20 <host> kernel: [96906.142343] receive_object+0x923/0xc70 [zfs] ┤
│May 4 17:16:20 <host> kernel: [96906.142394] receive_writer_thread+0x279/0xa00 [zfs] ┤
│May 4 17:16:20 <host> kernel: [96906.142399] ? set_curr_task_fair+0x26/0x50 ┤
│May 4 17:16:20 <host> kernel: [96906.142448] ? receive_process_write_record+0x190/0x190 [zfs] ┤
│May 4 17:16:20 <host> kernel: [96906.142455] ? __thread_exit+0x20/0x20 [spl] ┤
│May 4 17:16:20 <host> kernel: [96906.142460] ? thread_generic_wrapper+0x6f/0x80 [spl] ┤
│May 4 17:16:20 <host> kernel: [96906.142465] thread_generic_wrapper+0x6f/0x80 [spl] ┤
│May 4 17:16:20 <host> kernel: [96906.142469] kthread+0x112/0x130 ┤
│May 4 17:16:20 <host> kernel: [96906.142471] ? kthread_bind+0x30/0x30 ┤
│May 4 17:16:20 <host> kernel: [96906.142473] ret_from_fork+0x22/0x40

EDIT May 11th 2021: downgrading ZFS to 0.8.4-1~bpo10+1 resolved the issue. EDIT June 14th 2021: the issue has still been resolved on the downgraded ZFS version. EDIT December 20th 2021: the issue is still there. Also with zfs-2.0.3-9 on 5.10.0-8-amd64

vicsn avatar May 05 '21 08:05 vicsn

Same issue here, ran into this while copying many snapshots over from one pool to another. Stack trace is similar, but a little different on ZFS 2.1.1:

Nov 17 23:53:57 localhost kernel: VERIFY3(0 == dmu_bonus_hold_by_dnode(dn, FTAG, &db, flags)) failed (0 == 5)
Nov 17 23:53:57 localhost kernel: PANIC at dmu_recv.c:1798:receive_object()
Nov 17 23:53:57 localhost kernel: Showing stack for process 1363685
Nov 17 23:53:57 localhost kernel: CPU: 2 PID: 1363685 Comm: receive_writer Tainted: P           OE     5.13.12-200.fc34.x86_64 #1
Nov 17 23:53:57 localhost kernel: Hardware name: Supermicro X10SLH-F/X10SLM+-F/X10SLH-F/X10SLM+-F, BIOS 3.2a 05/31/2019
Nov 17 23:53:57 localhost kernel: Call Trace:
Nov 17 23:53:57 localhost kernel: dump_stack+0x76/0x94
Nov 17 23:53:57 localhost kernel: spl_panic+0xd4/0xfc [spl]
Nov 17 23:53:57 localhost kernel: ? dbuf_rele_and_unlock+0x487/0x640 [zfs]
Nov 17 23:53:57 localhost kernel: ? dbuf_create+0x5c8/0x5f0 [zfs]
Nov 17 23:53:57 localhost kernel: ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
Nov 17 23:53:57 localhost kernel: ? dnode_rele_and_unlock+0x64/0xc0 [zfs]
Nov 17 23:53:57 localhost kernel: receive_object+0xa59/0xc50 [zfs]
Nov 17 23:53:57 localhost kernel: receive_writer_thread+0x1ba/0xac0 [zfs]
Nov 17 23:53:57 localhost kernel: ? kfree+0x403/0x430
Nov 17 23:53:57 localhost kernel: ? thread_generic_wrapper+0x62/0x80 [spl]
Nov 17 23:53:57 localhost kernel: ? receive_read_payload_and_next_header+0x2f0/0x2f0 [zfs]
Nov 17 23:53:57 localhost kernel: ? thread_generic_wrapper+0x6f/0x80 [spl]
Nov 17 23:53:57 localhost kernel: thread_generic_wrapper+0x6f/0x80 [spl]
Nov 17 23:53:57 localhost kernel: ? __thread_exit+0x20/0x20 [spl]
Nov 17 23:53:57 localhost kernel: kthread+0x127/0x150
Nov 17 23:53:57 localhost kernel: ? set_kthread_struct+0x40/0x40
Nov 17 23:53:57 localhost kernel: ret_from_fork+0x22/0x30

It does seem random as after reboot I re-ran the exact same zfs send | zfs recv (with resume token) and it completed no issues.

stewartadam avatar Nov 21 '21 00:11 stewartadam

Any updates for this issue or #12785?

whyamiroot avatar Dec 20 '21 13:12 whyamiroot

Some debugging output after a recent crash

The issue occured again on january 6 14:33:34 - I base this on default log output which I think is UTC+1 by default. This would translate to timestamp 1641476014.

zfs_get_all.txt zfs_dbgmsg.txt zpool_history.txt

EDIT: logs were set to UTC + 1

vicsn avatar Jan 07 '22 08:01 vicsn

One more set of logs of the same event happening a day later at Jan 7 20:20:22 1641583222

dbgmsg_after_reboot.txt zpool_history.txt dbgmsg.txt

With the following zpool status -v output:

  pool: data
 state: ONLINE
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 12:50:05 with 0 errors on Tue Jan  4 00:12:53 2022
config:

	NAME        STATE     READ WRITE CKSUM
	data        ONLINE       0     0     0
	  raidz2-0  ONLINE       0     0     0
	    sda     ONLINE       0     0     0
	    sdb     ONLINE       0     0     0
	    sdd     ONLINE       0     0     0
	    sde     ONLINE       0     0     0
	spares
	  sdc       AVAIL   

errors: Permanent errors have been detected in the following files:

        data/private/private/private/private/private/%recv:<0x0>

vicsn avatar Jan 11 '22 08:01 vicsn

I am pretty sure the panic happens because dbuf_read() returns an EIO error in the codepath: receive_object() -> dmu_bonus_hold_by_dnode() -> dbuf_read().

Anyone who can replicate the bug:

  1. set echo 512 | sudo tee /sys/module/zfs/parameters/zfs_flags
  2. Trigger the bug
  3. post a cat /proc/spl/kstat/zfs/dbgmsg

gamanakis avatar Feb 03 '22 10:02 gamanakis

System information

Type Version/Name
Distribution Name Proxmox
Distribution Version 7.1.12
Linux Kernel 5.13.19-6-pve
Architecture amd64
ZFS Version 2.1.4-pve1

malfret avatar Apr 04 '22 11:04 malfret

@malfret thank you for the logs. Based on the time the panic occurred I went to look for events in dbgmsg.txt around 1648983726. However the events logged into dbgmsg.txt are about 10 minutes later than the panic (starting at 1648984381).

If you can reproduce this, would you mind posting a dbgmsg (with echo 512 > /sys/module/zfs/parameters/zfs_flags set beforehand) right after the panic occurs?

gamanakis avatar Apr 04 '22 12:04 gamanakis

At the time of panic, the file system was not working properly and the shell script was unable to save the log. The log was saved manually.

How to increase the log size so that the data can be saved?

malfret avatar Apr 04 '22 12:04 malfret

If I change zfs_dbgmsg_maxsize - it will increase the size of the dbgmsg log?

echo $((128*1024*1024)) >> /sys/module/zfs/parameters/zfs_dbgmsg_maxsize

malfret avatar Apr 04 '22 12:04 malfret

If I change zfs_dbgmsg_maxsize - it will increase the size of the dbgmsg log?

echo $((128*1024*1024)) >> /sys/module/zfs/parameters/zfs_dbgmsg_maxsize

Right, that may help.

gamanakis avatar Apr 04 '22 21:04 gamanakis

2022-04-06_dbgmsg.log.zip syslog 2022-04-06.txt

Time syslog UTC+4

malfret avatar Apr 06 '22 02:04 malfret

The panic seems to happen because arc_untransform() returns an EIO error in the codepath: receive_object() -> dmu_bonus_hold_by_dnode() -> dbuf_read() -> arc_untransform().

And that happens because arc_buf_fill() returns an error ECKSUM (EBADE or 52 in linux).

gamanakis avatar May 04 '22 19:05 gamanakis

@gamanakis We are still experiencing this bug on our production systems, any chance that this will be solved any time soon? We are running the latest version from debian stable 2.0.3-9. Could this be escalated maybe?

Any help is appreciated!

rockhouse avatar Nov 29 '22 18:11 rockhouse

Also happened with 2.1.5-1~bpo11+1. Nothing logged to syslog.

kernel:[9838382.870025] VERIFY3(0 == dmu_bonus_hold_by_dnode(dn, FTAG, &db, flags)) failed (0 == 5)
kernel:[9838382.870083] PANIC at dmu_recv.c:1811:receive_object()

siilike avatar Dec 18 '22 19:12 siilike

Also occurred with 2.1.7 (with https://github.com/openzfs/zfs/commit/c8d2ab05e1e6bf21185723fabdebbe5bb3374381 reverted, otherwise I would run into #14252).

Dec 23 12:30:18 jonathan-desktop kernel: VERIFY3(0 == dmu_bonus_hold_by_dnode(dn, FTAG, &db, flags)) failed (0 == 5)
Dec 23 12:30:18 jonathan-desktop kernel: PANIC at dmu_recv.c:1806:receive_object()
Dec 23 12:30:18 jonathan-desktop kernel: Showing stack for process 3273208
Dec 23 12:30:18 jonathan-desktop kernel: CPU: 5 PID: 3273208 Comm: receive_writer Tainted: P           OE      6.1.1-x64v3-xanmod1 #0~20221222.0006b63
Dec 23 12:30:18 jonathan-desktop kernel: Hardware name: To Be Filled By O.E.M. X570 Taichi/X570 Taichi, BIOS P5.00 10/19/2022
Dec 23 12:30:18 jonathan-desktop kernel: Call Trace:
Dec 23 12:30:18 jonathan-desktop kernel:  <TASK>
Dec 23 12:30:18 jonathan-desktop kernel:  dump_stack_lvl+0x44/0x5c
Dec 23 12:30:18 jonathan-desktop kernel:  spl_panic+0xcb/0xe3 [spl]
Dec 23 12:30:18 jonathan-desktop kernel:  ? dbuf_rele_and_unlock+0x5ad/0x730 [zfs]
Dec 23 12:30:18 jonathan-desktop kernel:  ? dbuf_cons+0xa2/0xc0 [zfs]
Dec 23 12:30:18 jonathan-desktop kernel:  ? spl_kmem_cache_alloc+0x9f/0x740 [spl]
Dec 23 12:30:18 jonathan-desktop kernel:  ? dbuf_read+0x10a/0x5e0 [zfs]
Dec 23 12:30:18 jonathan-desktop kernel:  ? aggsum_add+0x16a/0x180 [zfs]
Dec 23 12:30:18 jonathan-desktop kernel:  ? dnode_rele_and_unlock+0x54/0xe0 [zfs]
Dec 23 12:30:18 jonathan-desktop kernel:  receive_object+0xb06/0xd30 [zfs]
Dec 23 12:30:18 jonathan-desktop kernel:  ? dnode_rele_and_unlock+0x54/0xe0 [zfs]
Dec 23 12:30:18 jonathan-desktop kernel:  ? dmu_object_info+0x4f/0x80 [zfs]
Dec 23 12:30:18 jonathan-desktop kernel:  receive_writer_thread+0x1b2/0xa90 [zfs]
Dec 23 12:30:18 jonathan-desktop kernel:  ? set_next_task_fair+0x66/0x90
Dec 23 12:30:18 jonathan-desktop kernel:  ? start_cfs_bandwidth.part.0+0x50/0x50
Dec 23 12:30:18 jonathan-desktop kernel:  ? set_user_nice.part.0+0x11b/0x330
Dec 23 12:30:18 jonathan-desktop kernel:  ? receive_process_write_record+0x160/0x160 [zfs]
Dec 23 12:30:18 jonathan-desktop kernel:  ? __thread_exit+0x10/0x10 [spl]
Dec 23 12:30:18 jonathan-desktop kernel:  thread_generic_wrapper+0x55/0x60 [spl]
Dec 23 12:30:18 jonathan-desktop kernel:  kthread+0xd5/0x100
Dec 23 12:30:18 jonathan-desktop kernel:  ? kthread_complete_and_exit+0x20/0x20
Dec 23 12:30:18 jonathan-desktop kernel:  ret_from_fork+0x22/0x30
Dec 23 12:30:18 jonathan-desktop kernel:  </TASK>

singingtelegram avatar Dec 24 '22 03:12 singingtelegram

[1687794.507453] VERIFY3(0 == dmu_bonus_hold_by_dnode(dn, FTAG, &db, flags)) failed (0 == 5)
[1687794.507479] PANIC at dmu_recv.c:1806:receive_object()
[1687794.507493] Showing stack for process 2882666
[1687794.507505] CPU: 5 PID: 2882666 Comm: receive_writer Tainted: P      D    OE     5.19.0-38-generic #39~22.04.1-Ubuntu
[1687794.507529] Hardware name: Intel(R) Client Systems NUC8i7BEH/NUC8BEB, BIOS BECFL357.86A.0090.2022.0916.1942 09/16/2022
[1687794.507547] Call Trace:
[1687794.507557]  <TASK>
[1687794.507571]  show_stack+0x52/0x69
[1687794.507598]  dump_stack_lvl+0x49/0x6d
[1687794.507629]  dump_stack+0x10/0x18
[1687794.507647]  spl_dumpstack+0x29/0x35 [spl]
[1687794.507691]  spl_panic+0xd1/0xe9 [spl]
[1687794.507732]  ? arc_space_consume+0x54/0x130 [zfs]
[1687794.508027]  ? dbuf_create+0x5c1/0x5f0 [zfs]
[1687794.508306]  ? dbuf_read+0x11b/0x630 [zfs]
[1687794.508587]  ? dmu_bonus_hold_by_dnode+0x15b/0x1b0 [zfs]
[1687794.508860]  receive_object+0xae1/0xd40 [zfs]
[1687794.509186]  ? __slab_free+0x31/0x340
[1687794.509211]  ? spl_kmem_free+0x32/0x40 [spl]
[1687794.509244]  ? kfree+0x30f/0x330
[1687794.509259]  ? mutex_lock+0x13/0x50
[1687794.509277]  receive_writer_thread+0x1ce/0xb50 [zfs]
[1687794.509560]  ? set_next_task_fair+0x70/0xb0
[1687794.509578]  ? receive_process_write_record+0x1a0/0x1a0 [zfs]
[1687794.509861]  ? spl_kmem_free+0x32/0x40 [spl]
[1687794.509894]  ? kfree+0x30f/0x330
[1687794.509909]  ? receive_process_write_record+0x1a0/0x1a0 [zfs]
[1687794.510191]  ? __thread_exit+0x20/0x20 [spl]
[1687794.510225]  thread_generic_wrapper+0x61/0x80 [spl]
[1687794.510256]  ? thread_generic_wrapper+0x61/0x80 [spl]
[1687794.510288]  kthread+0xeb/0x120
[1687794.510303]  ? kthread_complete_and_exit+0x20/0x20
[1687794.510321]  ret_from_fork+0x1f/0x30
[1687794.510343]  </TASK>

lnagel avatar Apr 23 '23 04:04 lnagel

You might want to update to the latest ZFS release. jonathonf's PPA hasn't been updated in quite a while now.

What I did was that I grabbed the zfs-linux source tarball from the lunar repo and compiled it with dpkg-buildpackage.

On Sat, Apr 22, 2023 at 9:21 PM Lenno Nagel @.***> wrote:

[1687794.507453] VERIFY3(0 == dmu_bonus_hold_by_dnode(dn, FTAG, &db, flags)) failed (0 == 5) [1687794.507479] PANIC at dmu_recv.c:1806:receive_object() [1687794.507493] Showing stack for process 2882666 [1687794.507505] CPU: 5 PID: 2882666 Comm: receive_writer Tainted: P D OE 5.19.0-38-generic #39~22.04.1-Ubuntu [1687794.507529] Hardware name: Intel(R) Client Systems NUC8i7BEH/NUC8BEB, BIOS BECFL357.86A.0090.2022.0916.1942 09/16/2022 [1687794.507547] Call Trace: [1687794.507557] <TASK> [1687794.507571] show_stack+0x52/0x69 [1687794.507598] dump_stack_lvl+0x49/0x6d [1687794.507629] dump_stack+0x10/0x18 [1687794.507647] spl_dumpstack+0x29/0x35 [spl] [1687794.507691] spl_panic+0xd1/0xe9 [spl] [1687794.507732] ? arc_space_consume+0x54/0x130 [zfs] [1687794.508027] ? dbuf_create+0x5c1/0x5f0 [zfs] [1687794.508306] ? dbuf_read+0x11b/0x630 [zfs] [1687794.508587] ? dmu_bonus_hold_by_dnode+0x15b/0x1b0 [zfs] [1687794.508860] receive_object+0xae1/0xd40 [zfs] [1687794.509186] ? __slab_free+0x31/0x340 [1687794.509211] ? spl_kmem_free+0x32/0x40 [spl] [1687794.509244] ? kfree+0x30f/0x330 [1687794.509259] ? mutex_lock+0x13/0x50 [1687794.509277] receive_writer_thread+0x1ce/0xb50 [zfs] [1687794.509560] ? set_next_task_fair+0x70/0xb0 [1687794.509578] ? receive_process_write_record+0x1a0/0x1a0 [zfs] [1687794.509861] ? spl_kmem_free+0x32/0x40 [spl] [1687794.509894] ? kfree+0x30f/0x330 [1687794.509909] ? receive_process_write_record+0x1a0/0x1a0 [zfs] [1687794.510191] ? __thread_exit+0x20/0x20 [spl] [1687794.510225] thread_generic_wrapper+0x61/0x80 [spl] [1687794.510256] ? thread_generic_wrapper+0x61/0x80 [spl] [1687794.510288] kthread+0xeb/0x120 [1687794.510303] ? kthread_complete_and_exit+0x20/0x20 [1687794.510321] ret_from_fork+0x1f/0x30 [1687794.510343] </TASK>

— Reply to this email directly, view it on GitHub https://github.com/openzfs/zfs/issues/12001#issuecomment-1518939505, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABI35LW7C3C2JF5BL4ICZVLXCSU4DANCNFSM44EHQVOA . You are receiving this because you commented.Message ID: @.***>

singingtelegram avatar Apr 23 '23 09:04 singingtelegram

Using 2.2.0 now

[   46.799495] ZFS: Loaded module v2.2.0-0ubuntu1~23.10, ZFS pool version 5000, ZFS filesystem version 5

The error seems to be the same

[984462.595691] VERIFY0(0 == dmu_bonus_hold_by_dnode(dn, FTAG, &db, flags)) failed (0 == 5)
[984462.595700] PANIC at dmu_recv.c:2093:receive_object()
[984462.595703] Showing stack for process 44582
[984462.595704] CPU: 4 PID: 44582 Comm: receive_writer Tainted: P           O       6.5.0-14-generic #14-Ubuntu
[984462.595706] Hardware name: Intel(R) Client Systems NUC8i7BEH/NUC8BEB, BIOS BECFL357.86A.0090.2022.0916.1942 09/16/2022
[984462.595708] Call Trace:
[984462.595710]  <TASK>
[984462.595713]  dump_stack_lvl+0x48/0x70
[984462.595722]  dump_stack+0x10/0x20
[984462.595724]  spl_dumpstack+0x29/0x40 [spl]
[984462.595742]  spl_panic+0xfc/0x120 [spl]
[984462.595757]  receive_object+0x9ce/0xaa0 [zfs]
[984462.596131]  ? __list_del_entry+0x1d/0x30 [zfs]
[984462.596427]  ? list_del+0xd/0x30 [zfs]
[984462.596674]  receive_process_record+0x84/0x2b0 [zfs]
[984462.596963]  receive_writer_thread+0xb7/0x1a0 [zfs]
[984462.597246]  ? __pfx_receive_writer_thread+0x10/0x10 [zfs]
[984462.597543]  ? __pfx_thread_generic_wrapper+0x10/0x10 [spl]
[984462.597563]  thread_generic_wrapper+0x5c/0x70 [spl]
[984462.597583]  kthread+0xef/0x120
[984462.597587]  ? __pfx_kthread+0x10/0x10
[984462.597590]  ret_from_fork+0x44/0x70
[984462.597594]  ? __pfx_kthread+0x10/0x10
[984462.597597]  ret_from_fork_asm+0x1b/0x30
[984462.597601]  </TASK>

lnagel avatar Dec 24 '23 14:12 lnagel