ntirpc icon indicating copy to clipboard operation
ntirpc copied to clipboard

Segmentation fault at start of Ganesha in libntirpc

Open dirk1730 opened this issue 5 years ago • 7 comments

This happened twice during system test.

Ganesha was stopped normally. And then restarted. During start-up, there is a segmentation fault in libntirpc. There are many exports defined in Ganesha. Some have sec=sys, others have sec=krb5.

Unfortunately no core dumps available.

Available stack trace (Time: 2019-06-10 07:18:25 (1560151105)):

/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f56506ae390]
/opt/ampli/apps/nfsganesha/bin/../lib/libntirpc.so.1.7(+0x2c82b)[0x7f565189582b]
/opt/ampli/apps/nfsganesha/bin/../lib/libntirpc.so.1.7(_svcauth_gss+0x2bb)[0x7f565189840b]
/opt/ampli/apps/nfsganesha/bin/ganesha.nfsd[0x44d328]
/opt/ampli/apps/nfsganesha/bin/ganesha.nfsd[0x44cc67]
/opt/ampli/apps/nfsganesha/bin/../lib/libntirpc.so.1.7(+0x212fd)[0x7f565188a2fd]
/opt/ampli/apps/nfsganesha/bin/../lib/libntirpc.so.1.7(+0x218c2)[0x7f565188a8c2]
/opt/ampli/apps/nfsganesha/bin/../lib/libntirpc.so.1.7(+0x2a50a)[0x7f565189350a]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f56506a46ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f56501d241d]

ganesha.log (log level=info):

2019-06-11 17:17:47.5476 +0000 USER.NOTICE [fslib-ganesha]  [main] nfs_start :NFS STARTUP :EVENT :-------------------------------------------------
2019-06-11 17:17:47.5477 +0000 USER.NOTICE [fslib-ganesha]  [main] nfs_start :NFS STARTUP :EVENT :             NFS SERVER INITIALIZED
2019-06-11 17:17:47.5477 +0000 USER.NOTICE [fslib-ganesha]  [main] nfs_start :NFS STARTUP :EVENT :-------------------------------------------------
2019-06-11 17:18:22.5398 +0000 USER.INFO [fslib-ganesha]  [svc_11] nfs_rpc_process_request :DISP :INFO :Could not authenticate request... rejecting with AUTH_STAT=RPCSEC_GSS_CREDPROBLEM
2019-06-11 17:18:23.5263 +0000 USER.INFO [fslib-ganesha]  [svc_13] nfs_rpc_process_request :DISP :INFO :Could not authenticate request... rejecting with AUTH_STAT=RPCSEC_GSS_CREDPROBLEM
2019-06-11 17:18:25.7808 +0000 USER.INFO [fslib-ganesha]  [svc_10] nfs_rpc_process_request :DISP :INFO :Could not authenticate request... rejecting with AUTH_STAT=RPCSEC_GSS_CREDPROBLEM

The Segmentation fault seems to happen just after the first "Could not authenticate request..." message (according to the logged times).

Could this be the same problem as the old problem: https://bugzilla.redhat.com/show_bug.cgi?id=1369674

dirk1730 avatar Jun 26 '19 15:06 dirk1730

This is Ganesha 2.7.1. libntirpc v1.7.

dirk1730 avatar Jun 26 '19 15:06 dirk1730

Hard to say, since the backtrace doesn't have enough info as given. Can you use addr2line to convert those to line numbers? (Only the bits in ntirpc/ganesha matter)

dang avatar Jun 26 '19 15:06 dang

For some reason, addr2line gave strange results. I used gdb to reconstruct the following:

/opt/ampli/apps/nfsganesha/bin/../lib/libntirpc.so.1.7(+0x2c82b)[0x7f565189582b]  // nfs-ganesha/src/libntirpc/src/authgss_hash.c:144 + ???
/opt/ampli/apps/nfsganesha/bin/../lib/libntirpc.so.1.7(_svcauth_gss+0x2bb)[0x7f565189840b]  // nfs-ganesha/src/libntirpc/src/svc_auth_gss.c:450
/opt/ampli/apps/nfsganesha/bin/ganesha.nfsd[0x44d328]  // nfs-ganesha/src/libntirpc/src/svc_auth.c:98
/opt/ampli/apps/nfsganesha/bin/ganesha.nfsd[0x44cc67]  // nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:724

For the top line, I have no info on the line number, so I included part of the authgss_ctx_hash_get disassembly (the problem occurs at 0x2c82b):

Dump of assembler code for function authgss_ctx_hash_get:
   0x000000000002c790 <+0>:	push   r15
   0x000000000002c792 <+2>:	push   r14
   0x000000000002c794 <+4>:	push   r13
   0x000000000002c796 <+6>:	push   r12
   0x000000000002c798 <+8>:	push   rbp
   0x000000000002c799 <+9>:	push   rbx
   0x000000000002c79a <+10>:	sub    rsp,0x128
   0x000000000002c7a1 <+17>:	mov    rax,QWORD PTR fs:0x28
   0x000000000002c7aa <+26>:	mov    QWORD PTR [rsp+0x118],rax
   0x000000000002c7b2 <+34>:	xor    eax,eax
   0x000000000002c7b4 <+36>:	cmp    BYTE PTR [rip+0x2126ed],0x0        # 0x23eea8 <authgss_hash_st+72>
   0x000000000002c7bb <+43>:	je     0x2c960 <authgss_ctx_hash_get+464>
   0x000000000002c7c1 <+49>:	mov    rdx,QWORD PTR [rdi+0x18]
   0x000000000002c7c5 <+53>:	mov    rax,QWORD PTR [rdx+0x8]
   0x000000000002c7c9 <+57>:	add    rax,QWORD PTR [rdx]
   0x000000000002c7cc <+60>:	xor    edx,edx
   0x000000000002c7ce <+62>:	mov    DWORD PTR [rsp+0x84],eax
   0x000000000002c7d5 <+69>:	div    DWORD PTR [rip+0x2126ad]        # 0x23ee88 <authgss_hash_st+40>
   0x000000000002c7db <+75>:	mov    edx,edx
   0x000000000002c7dd <+77>:	lea    rbx,[rdx+rdx*8]
   0x000000000002c7e1 <+81>:	shl    rbx,0x5
   0x000000000002c7e5 <+85>:	add    rbx,QWORD PTR [rip+0x2126ac]        # 0x23ee98 <authgss_hash_st+56>
   0x000000000002c7ec <+92>:	lea    r12,[rbx+0x78]
   0x000000000002c7f0 <+96>:	mov    r14,rbx
   0x000000000002c7f3 <+99>:	mov    rdi,r12
   0x000000000002c7f6 <+102>:	call   0x6b80 <pthread_mutex_lock@plt>
   0x000000000002c7fb <+107>:	mov    eax,DWORD PTR [rsp+0x84]
   0x000000000002c802 <+114>:	test   rbx,rbx
   0x000000000002c805 <+117>:	mov    QWORD PTR [rsp+0x8],rax
   0x000000000002c80a <+122>:	je     0x2c978 <authgss_ctx_hash_get+488>
   0x000000000002c810 <+128>:	movsxd rcx,DWORD PTR [rip+0x212679]        # 0x23ee90 <authgss_hash_st+48>
   0x000000000002c817 <+135>:	xor    edx,edx
   0x000000000002c819 <+137>:	div    rcx
   0x000000000002c81c <+140>:	mov    rax,QWORD PTR [r14+0xd8]
   0x000000000002c823 <+147>:	mov    r15d,edx
   0x000000000002c826 <+150>:	mov    QWORD PTR [rsp+0x18],rdx
   0x000000000002c82b <+155>:	mov    r13,QWORD PTR [rax+r15*8]
   0x000000000002c82f <+159>:	test   r13,r13
   0x000000000002c832 <+162>:	je     0x2c9a0 <authgss_ctx_hash_get+528>
   0x000000000002c838 <+168>:	lea    rsi,[rsp+0x20]
   0x000000000002c83d <+173>:	mov    rdi,r13
   0x000000000002c840 <+176>:	mov    QWORD PTR [rsp+0x10],rsi
   0x000000000002c845 <+181>:	call   QWORD PTR [r14+0xc0]
   0x000000000002c84c <+188>:	test   eax,eax
   0x000000000002c84e <+190>:	je     0x2c950 <authgss_ctx_hash_get+448>
   0x000000000002c854 <+196>:	mov    rsi,QWORD PTR [rsp+0x10]
   0x000000000002c859 <+201>:	lea    rdi,[r14+0xb8]
   0x000000000002c860 <+208>:	call   0x65e0 <opr_rbtree_lookup@plt>
   0x000000000002c865 <+213>:	test   rax,rax
   0x000000000002c868 <+216>:	mov    rbp,rax
   0x000000000002c86b <+219>:	je     0x2c9c0 <authgss_ctx_hash_get+560>

dirk1730 avatar Jun 27 '19 09:06 dirk1730

Hope this helps!

Thx, Dirk.

dirk1730 avatar Jun 27 '19 09:06 dirk1730

It's hard to figure out exactly where the crash is from the assembly, but my best guess is that it's dereferencing gd. Can you try 1.7.4? There's been several fixes to auth gss in it, including some to gss_data refcounting.

dang avatar Jun 27 '19 13:06 dang

We will try 1.7.4. Do we need to upgrade Ganesha to 2.7.4 for this? Maybe it is recommended to upgrade Ganesha too?

dirk1730 avatar Jun 28 '19 14:06 dirk1730

You need at least Ganesha 2.7.3, since the API changed a bit. I'd highly recommend 2.7.4.

dang avatar Jun 28 '19 15:06 dang

I'm going to close this as it is years old and V2.7 is long out of support (we are now supporting V5.x and working V6).

ffilz avatar May 14 '24 20:05 ffilz