ntirpc
ntirpc copied to clipboard
Segmentation fault at start of Ganesha in libntirpc
This happened twice during system test.
Ganesha was stopped normally. And then restarted. During start-up, there is a segmentation fault in libntirpc. There are many exports defined in Ganesha. Some have sec=sys, others have sec=krb5.
Unfortunately no core dumps available.
Available stack trace (Time: 2019-06-10 07:18:25 (1560151105)):
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f56506ae390]
/opt/ampli/apps/nfsganesha/bin/../lib/libntirpc.so.1.7(+0x2c82b)[0x7f565189582b]
/opt/ampli/apps/nfsganesha/bin/../lib/libntirpc.so.1.7(_svcauth_gss+0x2bb)[0x7f565189840b]
/opt/ampli/apps/nfsganesha/bin/ganesha.nfsd[0x44d328]
/opt/ampli/apps/nfsganesha/bin/ganesha.nfsd[0x44cc67]
/opt/ampli/apps/nfsganesha/bin/../lib/libntirpc.so.1.7(+0x212fd)[0x7f565188a2fd]
/opt/ampli/apps/nfsganesha/bin/../lib/libntirpc.so.1.7(+0x218c2)[0x7f565188a8c2]
/opt/ampli/apps/nfsganesha/bin/../lib/libntirpc.so.1.7(+0x2a50a)[0x7f565189350a]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f56506a46ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f56501d241d]
ganesha.log (log level=info):
2019-06-11 17:17:47.5476 +0000 USER.NOTICE [fslib-ganesha] [main] nfs_start :NFS STARTUP :EVENT :-------------------------------------------------
2019-06-11 17:17:47.5477 +0000 USER.NOTICE [fslib-ganesha] [main] nfs_start :NFS STARTUP :EVENT : NFS SERVER INITIALIZED
2019-06-11 17:17:47.5477 +0000 USER.NOTICE [fslib-ganesha] [main] nfs_start :NFS STARTUP :EVENT :-------------------------------------------------
2019-06-11 17:18:22.5398 +0000 USER.INFO [fslib-ganesha] [svc_11] nfs_rpc_process_request :DISP :INFO :Could not authenticate request... rejecting with AUTH_STAT=RPCSEC_GSS_CREDPROBLEM
2019-06-11 17:18:23.5263 +0000 USER.INFO [fslib-ganesha] [svc_13] nfs_rpc_process_request :DISP :INFO :Could not authenticate request... rejecting with AUTH_STAT=RPCSEC_GSS_CREDPROBLEM
2019-06-11 17:18:25.7808 +0000 USER.INFO [fslib-ganesha] [svc_10] nfs_rpc_process_request :DISP :INFO :Could not authenticate request... rejecting with AUTH_STAT=RPCSEC_GSS_CREDPROBLEM
The Segmentation fault seems to happen just after the first "Could not authenticate request..." message (according to the logged times).
Could this be the same problem as the old problem: https://bugzilla.redhat.com/show_bug.cgi?id=1369674
This is Ganesha 2.7.1. libntirpc v1.7.
Hard to say, since the backtrace doesn't have enough info as given. Can you use addr2line to convert those to line numbers? (Only the bits in ntirpc/ganesha matter)
For some reason, addr2line gave strange results. I used gdb to reconstruct the following:
/opt/ampli/apps/nfsganesha/bin/../lib/libntirpc.so.1.7(+0x2c82b)[0x7f565189582b] // nfs-ganesha/src/libntirpc/src/authgss_hash.c:144 + ???
/opt/ampli/apps/nfsganesha/bin/../lib/libntirpc.so.1.7(_svcauth_gss+0x2bb)[0x7f565189840b] // nfs-ganesha/src/libntirpc/src/svc_auth_gss.c:450
/opt/ampli/apps/nfsganesha/bin/ganesha.nfsd[0x44d328] // nfs-ganesha/src/libntirpc/src/svc_auth.c:98
/opt/ampli/apps/nfsganesha/bin/ganesha.nfsd[0x44cc67] // nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:724
For the top line, I have no info on the line number, so I included part of the authgss_ctx_hash_get disassembly (the problem occurs at 0x2c82b):
Dump of assembler code for function authgss_ctx_hash_get:
0x000000000002c790 <+0>: push r15
0x000000000002c792 <+2>: push r14
0x000000000002c794 <+4>: push r13
0x000000000002c796 <+6>: push r12
0x000000000002c798 <+8>: push rbp
0x000000000002c799 <+9>: push rbx
0x000000000002c79a <+10>: sub rsp,0x128
0x000000000002c7a1 <+17>: mov rax,QWORD PTR fs:0x28
0x000000000002c7aa <+26>: mov QWORD PTR [rsp+0x118],rax
0x000000000002c7b2 <+34>: xor eax,eax
0x000000000002c7b4 <+36>: cmp BYTE PTR [rip+0x2126ed],0x0 # 0x23eea8 <authgss_hash_st+72>
0x000000000002c7bb <+43>: je 0x2c960 <authgss_ctx_hash_get+464>
0x000000000002c7c1 <+49>: mov rdx,QWORD PTR [rdi+0x18]
0x000000000002c7c5 <+53>: mov rax,QWORD PTR [rdx+0x8]
0x000000000002c7c9 <+57>: add rax,QWORD PTR [rdx]
0x000000000002c7cc <+60>: xor edx,edx
0x000000000002c7ce <+62>: mov DWORD PTR [rsp+0x84],eax
0x000000000002c7d5 <+69>: div DWORD PTR [rip+0x2126ad] # 0x23ee88 <authgss_hash_st+40>
0x000000000002c7db <+75>: mov edx,edx
0x000000000002c7dd <+77>: lea rbx,[rdx+rdx*8]
0x000000000002c7e1 <+81>: shl rbx,0x5
0x000000000002c7e5 <+85>: add rbx,QWORD PTR [rip+0x2126ac] # 0x23ee98 <authgss_hash_st+56>
0x000000000002c7ec <+92>: lea r12,[rbx+0x78]
0x000000000002c7f0 <+96>: mov r14,rbx
0x000000000002c7f3 <+99>: mov rdi,r12
0x000000000002c7f6 <+102>: call 0x6b80 <pthread_mutex_lock@plt>
0x000000000002c7fb <+107>: mov eax,DWORD PTR [rsp+0x84]
0x000000000002c802 <+114>: test rbx,rbx
0x000000000002c805 <+117>: mov QWORD PTR [rsp+0x8],rax
0x000000000002c80a <+122>: je 0x2c978 <authgss_ctx_hash_get+488>
0x000000000002c810 <+128>: movsxd rcx,DWORD PTR [rip+0x212679] # 0x23ee90 <authgss_hash_st+48>
0x000000000002c817 <+135>: xor edx,edx
0x000000000002c819 <+137>: div rcx
0x000000000002c81c <+140>: mov rax,QWORD PTR [r14+0xd8]
0x000000000002c823 <+147>: mov r15d,edx
0x000000000002c826 <+150>: mov QWORD PTR [rsp+0x18],rdx
0x000000000002c82b <+155>: mov r13,QWORD PTR [rax+r15*8]
0x000000000002c82f <+159>: test r13,r13
0x000000000002c832 <+162>: je 0x2c9a0 <authgss_ctx_hash_get+528>
0x000000000002c838 <+168>: lea rsi,[rsp+0x20]
0x000000000002c83d <+173>: mov rdi,r13
0x000000000002c840 <+176>: mov QWORD PTR [rsp+0x10],rsi
0x000000000002c845 <+181>: call QWORD PTR [r14+0xc0]
0x000000000002c84c <+188>: test eax,eax
0x000000000002c84e <+190>: je 0x2c950 <authgss_ctx_hash_get+448>
0x000000000002c854 <+196>: mov rsi,QWORD PTR [rsp+0x10]
0x000000000002c859 <+201>: lea rdi,[r14+0xb8]
0x000000000002c860 <+208>: call 0x65e0 <opr_rbtree_lookup@plt>
0x000000000002c865 <+213>: test rax,rax
0x000000000002c868 <+216>: mov rbp,rax
0x000000000002c86b <+219>: je 0x2c9c0 <authgss_ctx_hash_get+560>
Hope this helps!
Thx, Dirk.
It's hard to figure out exactly where the crash is from the assembly, but my best guess is that it's dereferencing gd. Can you try 1.7.4? There's been several fixes to auth gss in it, including some to gss_data refcounting.
We will try 1.7.4. Do we need to upgrade Ganesha to 2.7.4 for this? Maybe it is recommended to upgrade Ganesha too?
You need at least Ganesha 2.7.3, since the API changed a bit. I'd highly recommend 2.7.4.
I'm going to close this as it is years old and V2.7 is long out of support (we are now supporting V5.x and working V6).