glusterfs glusterfs client may crash/hang when the same file is written using different fds simultaneously when a race is hit

Gluster client crashed with Thread-1

#0  0x00007fa9a54ec84c in __pthread_kill_implementation () from /lib64/libc.so.6
#1  0x00007fa9a549f6a6 in raise () from /lib64/libc.so.6
#2  0x00007fa9a54897d3 in abort () from /lib64/libc.so.6
#3  0x00007fa9a54896fb in __assert_fail_base.cold () from /lib64/libc.so.6
#4  0x00007fa9a5498396 in __assert_fail () from /lib64/libc.so.6
#5  0x00007fa993636561 in __afr_eager_lock_handle (local=0x7fa9180117b0, take_lock=0x7fa9a4b663d7, do_pre_op=0x7fa9a4b663d6, timer_local=0x7fa9a4b663c8) at afr-transaction.c:2856
#6  0x00007fa993636644 in afr_transaction_start (local=0x7fa9180117b0, this=0x7fa98ff74ff0) at afr-transaction.c:2881
#7  0x00007fa993636c95 in afr_transaction (frame=0x7fa918022e70, this=0x7fa98ff74ff0, type=AFR_DATA_TRANSACTION) at afr-transaction.c:3029
#8  0x00007fa99361634b in afr_do_writev (frame=0x7fa918053990, this=0x7fa98ff74ff0) at afr-inode-write.c:477
#9  0x00007fa993616ada in afr_writev (frame=0x7fa918053990, this=0x7fa98ff74ff0, fd=0x7fa98a2de540, vector=0x7fa918078a50, count=1, offset=131072, flags=32769, iobref=0x7fa976375c90, 
    xdata=0x0) at afr-inode-write.c:555
#10 0x00007fa993593dc1 in dht_writev (frame=0x7fa9180a0e60, this=0x7fa98ff5fdb0, fd=0x7fa98a2de540, vector=0x7fa9a4b66700, count=1, off=131072, flags=32769, iobref=0x7fa976375c90, xdata=0x0)
    at dht-inode-write.c:223
...

On inspecting the core:

#5  0x00007fa993636561 in __afr_eager_lock_handle (local=0x7fa9180117b0, take_lock=0x7fa9a4b663d7, do_pre_op=0x7fa9a4b663d6, timer_local=0x7fa9a4b663c8) at afr-transaction.c:2856
2856	        GF_ASSERT(!(*take_lock));
(gdb) p *lock
$29 = {num_inodelks = 1, event_generation = 3, release = false, acquired = true, delay_timer = 0x0, owners = {next = 0x7fa93833fd18, prev = 0x7fa93833fd18}, post_op = {
    next = 0x7fa9140c69a8, prev = 0x7fa9140c69a8}, waiting = {next = 0x7fa93833fd38, prev = 0x7fa93833fd38}, frozen = {next = 0x7fa93833fd48, prev = 0x7fa93833fd48}}
(gdb) p *lock->owners
Structure has no component named operator*.
(gdb) p &lock->owners
$30 = (struct list_head *) 0x7fa93833fd18
(gdb) p lock->owners
$31 = {next = 0x7fa93833fd18, prev = 0x7fa93833fd18}
(gdb) p lock->post_op 
$32 = {next = 0x7fa9140c69a8, prev = 0x7fa9140c69a8}
(gdb) p &lock->post_op 
$33 = (struct list_head *) 0x7fa93833fd28
(gdb) p lock->waiting
$34 = {next = 0x7fa93833fd38, prev = 0x7fa93833fd38}
(gdb) p &lock->waiting
$35 = (struct list_head *) 0x7fa93833fd38
(gdb) p &lock->frozen
$36 = (struct list_head *) 0x7fa93833fd48
(gdb) p lock->frozen
$37 = {next = 0x7fa93833fd48, prev = 0x7fa93833fd48}

The frame is stuck in lock->post_op list. At the same time Thread-26 is in the following state:

#0  0x00007fa9a54e7890 in __lll_lock_wait () from /lib64/libc.so.6
(gdb) bt
#0  0x00007fa9a54e7890 in __lll_lock_wait () from /lib64/libc.so.6
#1  0x00007fa9a54edef2 in pthread_mutex_lock@@GLIBC_2.2.5 () from /lib64/libc.so.6
#2  0x00007fa9936353d6 in afr_delayed_changelog_wake_up_cbk (data=0x7fa9140c3bb0) at afr-transaction.c:2483
#3  0x00007fa99366f34a in afr_delayed_changelog_wake_resume (this=0x7fa98ff74ff0, inode=0x7fa9408f2f70, stub=0x7fa93baeffb0)
    at /home/pranith.karampuri/workspace/glusterfs/xlators/cluster/afr/src/afr-common.c:3641
#4  0x00007fa99366f4b0 in afr_flush (frame=0x7fa987045fc0, this=0x7fa98ff74ff0, fd=0x7fa988d06a30, xdata=0x0)
    at /home/pranith.karampuri/workspace/glusterfs/xlators/cluster/afr/src/afr-common.c:3669
#5  0x00007fa9935ac8d8 in dht_flush (frame=0x7fa9856dbe00, this=0x7fa98ff5fdb0, fd=0x7fa988d06a30, xdata=0x0) at dht-inode-read.c:795
...

Old fd was in the process of getting closed(FLUSH fop) and at around the same time, a write fop was issued on the new fd. flush fop in this case is leading to invalid state of the lock structure where lock->acquired is true, lock->delay_timer is NULL and lock->release is false. Ideally either lock->release should be true or lock->delay_timer should be non-NULL. This problem is happening because afr_wakeup_same_fd_delayed_op() is setting lock->delay_timer to NULL but not setting lock->release to true, this is leading to write on a new fd to go for a lock instead of being put in wait list. Since I was using DEBUG build it lead to crash. If it were release build, it would have led to hangs because of stale inodelks.

(gdb) fr 2
#2  0x00007fa9936353d6 in afr_delayed_changelog_wake_up_cbk (data=0x7fa9140c3bb0) at afr-transaction.c:2483
2483	    LOCK(&local->inode->lock);
(gdb) p lock
$38 = (afr_lock_t *) 0x7fa93833fd00
(gdb) t 1
[Switching to thread 1 (Thread 0x7fa9a4b68640 (LWP 393654))]
#0  0x00007fa9a54ec84c in __pthread_kill_implementation () from /lib64/libc.so.6
(gdb) fr 5
#5  0x00007fa993636561 in __afr_eager_lock_handle (local=0x7fa9180117b0, take_lock=0x7fa9a4b663d7, do_pre_op=0x7fa9a4b663d6, timer_local=0x7fa9a4b663c8) at afr-transaction.c:2856
2856	        GF_ASSERT(!(*take_lock));
(gdb) p lock
$39 = (afr_lock_t *) 0x7fa93833fd00

Initially I thought setting lock->release to true would be the fix, but after thinking a bit more, it looks like the implementation of flush when pending post-op operations were in present is not completely handled. It resumes flush operation as soon as the first operation with the same fd finishes, but instead flush operation should finish after all the pending operations at the time of flush are finished.

FYI @itisravi @karthik-us @xhernandez I remember you guys working on some issues where there were stale inode locks.

Dec 22 '21 11:12 pranithk

@pranithk have you made any progress here ?

Mar 03 '22 09:03 xhernandez

Thank you for your contributions. Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity. It will be closed in 2 weeks if no one responds with a comment here.

Oct 01 '22 09:10 stale[bot]

Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it.

Oct 22 '22 18:10 stale[bot]

This should be fixed

Jan 24 '23 08:01 xhernandez

Thank you for your contributions. Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity. It will be closed in 2 weeks if no one responds with a comment here.

Sep 17 '23 06:09 stale[bot]

hi @pranithk , is there any method to reproduce this issue?

Jan 08 '24 07:01 chen1195585098

glusterfs glusterfs copied to clipboard

glusterfs client may crash/hang when the same file is written using different fds simultaneously when a race is hit

glusterfs
glusterfs copied to clipboard