ntirpc
ntirpc copied to clipboard
reused fd leads to rejected connections
(This is with ganesha 3.5-ibm071.07, which I think is roughly ntirpc v1.8.0, but I wanted to mention it here because it looks like the current version still mishandles reference counts in this state.)
Our ganesha servers repeatedly get into a state where new connections are immediately closed, leading to sequences like this:
22:38:15.075330 IP 10.2.210.6.934 > 10.2.12.17.nfs: Flags [S], seq 2969462747, win 65535, options [mss 4052,sackOK,TS val 436389392 ecr 0,nop,wscale 10], length 0
22:38:15.075342 IP 10.2.12.17.nfs > 10.2.210.6.934: Flags [S.], seq 2311647831, ack 2969462748, win 28364, options [mss 4052,nop,wscale 9], length 0
22:38:15.075365 IP 10.2.210.6.934 > 10.2.12.17.nfs: Flags [.], ack 1, win 64, length 0
22:38:15.075391 IP 10.2.210.6.934 > 10.2.12.17.nfs: Flags [P.], seq 1:141, ack 1, win 64, length 140: NFS request xid 1566904168 136 getattr fh Unknown/430000013030000A00020028000A0B0C036096E8A502000002000000007072E8
22:38:15.075397 IP 10.2.12.17.nfs > 10.2.210.6.934: Flags [.], ack 141, win 58, length 0
22:38:15.075411 IP 10.2.12.17.nfs > 10.2.210.6.934: Flags [R.], seq 1, ack 141, win 58, length 0
22:38:15.075448 IP 10.2.210.6.934 > 10.2.12.17.nfs: Flags [S], seq 2969464591, win 65535, options [mss 4052,sackOK,TS val 436389392 ecr 0,nop,wscale 10], length 0
22:38:15.075461 IP 10.2.12.17.nfs > 10.2.210.6.934: Flags [S.], seq 2311649674, ack 2969464592, win 28364, options [mss 4052,nop,wscale 9], length 0
22:38:15.075479 IP 10.2.210.6.934 > 10.2.12.17.nfs: Flags [.], ack 1, win 64, length 0
22:38:15.075504 IP 10.2.210.6.934 > 10.2.12.17.nfs: Flags [P.], seq 1:141, ack 1, win 64, length 140: NFS request xid 1566904168 136 getattr fh Unknown/430000013030000A00020028000A0B0C036096E8A502000002000000007072E8
22:38:15.075510 IP 10.2.12.17.nfs > 10.2.210.6.934: Flags [.], ack 141, win 58, length 0
22:38:15.075524 IP 10.2.12.17.nfs > 10.2.210.6.934: Flags [R.], seq 1, ack 141, win 58, length 0
22:38:15.075558 IP 10.2.210.6.934 > 10.2.12.17.nfs: Flags [S], seq 2969466312, win 65535, options [mss 4052,sackOK,TS val 436389392 ecr 0,nop,wscale 10], length 0
ganesha is accepting the connection but then immediately closing it. Based on logs and stracing, I believe the accept is returning a fd that is already in the tree, so:
-
svc_vc_rendezvous
callsmakefd_xprt
on accepted fd -
makefd_xprt
callssvc_xprt_lookup
-
svc_xprt_lookup
finds the fd inopr_rbtree_lookup
, increments the reference count on it, and returns it (the flags must be INITIALIZED and not DESTROYED) -
makefd_xprt
finds it is already INITIALIZED and returns it -
svc_vc_rendezvous
finds it's not INITIAL, closes the fd, and returns, without decrementing the reference count
When the client attempts to reconnect, accept often gives the same fd, guaranteeing the same sequence over again. Eventually when some other connection is closed, accept returns a different fd not in the tree and things work. The end result is long random client hangs.
You can see this sequence in the logs (note the trace at makefd_xprt:293 and that the xp_refcnt increases the next time around.
2022-10-26 08:35:44 : epoch 00060018 : nfs01 : gpfs.ganesha.nfsd-16611[svc_453] rpc :TIRPC :F_DBG :svc_rqst_rearm_events_locked: 0x430a030 fd 20 xp_refcnt 486 sr_rec 0x42e9ae8 evchan 3 ev_refcnt 7 epoll_fd 18 control fd pair (16:17) rearm event 0x430a468
2022-10-26 08:35:44 : epoch 00060018 : nfs01 : gpfs.ganesha.nfsd-16611[svc_453] rpc :TIRPC :F_DBG :svc_rqst_rearm_events:167 unlocking @svc_rqst_rearm_events:163
2022-10-26 08:35:44 : epoch 00060018 : nfs01 : gpfs.ganesha.nfsd-16611[svc_207] rpc :TIRPC :F_DBG :svc_rqst_epoll_loop: epoll_fd 18 before epoll_wait (29000)
2022-10-26 08:35:44 : epoch 00060018 : nfs01 : gpfs.ganesha.nfsd-16611[svc_453] rpc :TIRPC :F_DBG :svc_ref_it() 0x7fc3cc1addf0 fd 190 fd_send -1 xp_refcnt 9592948 af 0 port 4294967295 @svc_xprt_lookup:201
2022-10-26 08:35:44 : epoch 00060018 : nfs01 : gpfs.ganesha.nfsd-16611[svc_453] rpc :TIRPC :F_DBG :svc_xprt_lookup:209 locking
2022-10-26 08:35:44 : epoch 00060018 : nfs01 : gpfs.ganesha.nfsd-16611[svc_453] rpc :TIRPC :F_DBG :svc_xprt_lookup:212 unlocking @svc_xprt_lookup:209
2022-10-26 08:35:44 : epoch 00060018 : nfs01 : gpfs.ganesha.nfsd-16611[svc_453] rpc :TIRPC :F_DBG :makefd_xprt:292 unlocking @svc_xprt_lookup:0
2022-10-26 08:35:44 : epoch 00060018 : nfs01 : gpfs.ganesha.nfsd-16611[svc_453] rpc :TIRPC :F_DBG :makefd_xprt() 0x7fc3cc1addf0 fd 190 fd_send -1 xp_refcnt 9592948 af 0 port 4294967295 @makefd_xprt:293
2022-10-26 08:35:44 : epoch 00060018 : nfs01 : gpfs.ganesha.nfsd-16611[svc_453] rpc :TIRPC :F_DBG :svc_release_it() 0x430a030 fd 20 fd_send -1 xp_refcnt 485 af 0 port 4294967295 @svc_rqst_xprt_task_recv:1187
2022-10-26 08:35:44 : epoch 00060018 : nfs01 : gpfs.ganesha.nfsd-16611[svc_453] rpc :TIRPC :F_DBG :work_pool_thread() svc_453 waiting
2022-10-26 08:35:44 : epoch 00060018 : nfs01 : gpfs.ganesha.nfsd-16611[svc_207] rpc :TIRPC :F_DBG :svc_rqst_epoll_loop: sr_rec 0x42e9ae8 evchan 3 ev_refcnt 6 epoll_fd 18 n_events 1
2022-10-26 08:35:44 : epoch 00060018 : nfs01 : gpfs.ganesha.nfsd-16611[svc_207] rpc :TIRPC :F_DBG :svc_ref_it() 0x430a030 fd 20 fd_send -1 xp_refcnt 486 af 0 port 4294967295 @svc_xprt_lookup:201
2022-10-26 08:35:44 : epoch 00060018 : nfs01 : gpfs.ganesha.nfsd-16611[svc_207] rpc :TIRPC :F_DBG :svc_xprt_lookup:209 locking
2022-10-26 08:35:44 : epoch 00060018 : nfs01 : gpfs.ganesha.nfsd-16611[svc_207] rpc :TIRPC :F_DBG :svc_xprt_lookup:212 unlocking @svc_xprt_lookup:209
2022-10-26 08:35:44 : epoch 00060018 : nfs01 : gpfs.ganesha.nfsd-16611[svc_207] rpc :TIRPC :F_DBG :svc_rqst_epoll_event: event 0x42fa660 00000001 RECV rpc_dplx_rec 0x430a030 (sr_rec 0x42e9ae8)
2022-10-26 08:35:44 : epoch 00060018 : nfs01 : gpfs.ganesha.nfsd-16611[svc_207] rpc :TIRPC :F_DBG :svc_rqst_epoll_event: 0x430a030 fd 20 xp_refcnt 486 event 00000001 xp_flags ADDED_RECV clear flag ADDED_RECV
2022-10-26 08:35:44 : epoch 00060018 : nfs01 : gpfs.ganesha.nfsd-16611[svc_207] rpc :TIRPC :F_DBG :svc_rqst_rearm_events:163 locking
2022-10-26 08:35:44 : epoch 00060018 : nfs01 : gpfs.ganesha.nfsd-16611[svc_18] rpc :TIRPC :F_DBG :work_pool_thread() svc_18 task 0x42e9ae8
2022-10-26 08:35:44 : epoch 00060018 : nfs01 : gpfs.ganesha.nfsd-16611[svc_207] rpc :TIRPC :F_DBG :svc_rqst_rearm_events_locked: xprt 0x430a030 fd 20 ev_flags ADDED_RECV
2022-10-26 08:35:44 : epoch 00060018 : nfs01 : gpfs.ganesha.nfsd-16611[svc_18] rpc :TIRPC :F_DBG :svc_rqst_epoll_loop: epoll_fd 18 before epoll_wait (29000)
2022-10-26 08:35:44 : epoch 00060018 : nfs01 : gpfs.ganesha.nfsd-16611[svc_207] rpc :TIRPC :F_DBG :svc_rqst_rearm_events_locked: 0x430a030 fd 20 xp_refcnt 486 sr_rec 0x42e9ae8 evchan 3 ev_refcnt 7 epoll_fd 18 control fd pair (16:17) rearm event 0x430a468
2022-10-26 08:35:44 : epoch 00060018 : nfs01 : gpfs.ganesha.nfsd-16611[svc_207] rpc :TIRPC :F_DBG :svc_rqst_rearm_events:167 unlocking @svc_rqst_rearm_events:163
2022-10-26 08:35:44 : epoch 00060018 : nfs01 : gpfs.ganesha.nfsd-16611[svc_207] rpc :TIRPC :F_DBG :svc_ref_it() 0x7fc3cc1addf0 fd 190 fd_send -1 xp_refcnt 9592949 af 0 port 4294967295 @svc_xprt_lookup:201
2022-10-26 08:35:44 : epoch 00060018 : nfs01 : gpfs.ganesha.nfsd-16611[svc_207] rpc :TIRPC :F_DBG :svc_xprt_lookup:209 locking
2022-10-26 08:35:44 : epoch 00060018 : nfs01 : gpfs.ganesha.nfsd-16611[svc_207] rpc :TIRPC :F_DBG :svc_xprt_lookup:212 unlocking @svc_xprt_lookup:209
2022-10-26 08:35:44 : epoch 00060018 : nfs01 : gpfs.ganesha.nfsd-16611[svc_207] rpc :TIRPC :F_DBG :makefd_xprt:292 unlocking @svc_xprt_lookup:0
2022-10-26 08:35:44 : epoch 00060018 : nfs01 : gpfs.ganesha.nfsd-16611[svc_207] rpc :TIRPC :F_DBG :makefd_xprt() 0x7fc3cc1addf0 fd 190 fd_send -1 xp_refcnt 9592949 af 0 port 4294967295 @makefd_xprt:293
2022-10-26 08:35:44 : epoch 00060018 : nfs01 : gpfs.ganesha.nfsd-16611[svc_207] rpc :TIRPC :F_DBG :svc_release_it() 0x430a030 fd 20 fd_send -1 xp_refcnt 485 af 0 port 4294967295 @svc_rqst_xprt_task_recv:1187
2022-10-26 08:35:44 : epoch 00060018 : nfs01 : gpfs.ganesha.nfsd-16611[svc_207] rpc :TIRPC :F_DBG :work_pool_thread() svc_207 waiting
What I don't know is how it got into this state in the first place. I see some reworking of SVC_DESTROY and SVC_RELEASE was done at some point that may avoid some race conditions, but somehow some fd was closed and reused before it was removed from the tree. (This is also TCP over IB, which may also add weirdness.)
Is this a dupe of 265?
Technically #265 is a dup of this, but yes looks like the same thing to me! We'll try to get the fix in.
Can we close this? #265 has been verified and closed.