nfs-ganesha
nfs-ganesha copied to clipboard
5.6: [fd_lru] fsal_start_fd_work :RW LOCK :CRIT :Error 22, acquiring mutex
Hello, recently we enabled fd reaper in config but occasionally get the following error:
ganesha.nfsd-3236423[fd_lru] fsal_start_fd_work :RW LOCK :CRIT :Error 22, acquiring mutex 0x7f6d74004910 (&fsal_fd->work_mutex) at /usr/src/debug/nfs-ganesha/5.6-r0/src/FSAL/commonlib.c:2796
and nfs-ganesha service exited with status=6/ABRT.
So it look like corrupted or freed memory for fsal_fd.
Do you have any ideas how to fix this?
Thanks, Alexey
We have some patches fixing lifetime of fsal_fd:
https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/1176997?usp=search
There is at least one merged into V6-dev.5 or earlier (soon we will backport some patches and tag V5.8)
Thank you, I'll try these fixes.
Unfortunately, after applying https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/1176997?usp=search fix we still get error, but much rare.
@aepotapov - Please check, if your Ganesha code fork have picked up the below patch...This should resolve above lock ordering problem scenario...Kindly confirm once validated...
https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/1170151 https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/1174224
Also there have been many fixes made in fd_lru code path, some in FSALs as well, please ensure to have the community approved patches incorporated soon...
And if it's recurring again, could you please share some more logs with backtraces plugged in to the mutex pointer dumped in PTHREAD_MUTEX_(init/lock/unlock/destroy), it seems someone have destroyed the mutex or there is some ordering problem here...
cc: @ffilz
Thank you! I will check that patch. Unfortunately original problem disappears even with v5.6 based code.
Hello, I was able to reproduce this problem again. Attached log with FSAL = FULL_DEBUG;NFS3 = FULL_DEBUG; Please note me if something else is needed. ganesha_copy.log
Could you try V5.9?
I don't see anything in that log that helps.
It looks like it is fixed in V5.9. Thanks!