ntirpc
ntirpc copied to clipboard
Seq write
@dang @ffilz @malahal
Can you please review the changesets?
This is inspired from the linux kernel client changeset: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/net/sunrpc/xprtsock.c?id=8d6f97d698b2b9e71124ba95688a6bf1adad5055
Also, for the current changeset, I haven't yet submitted the Ganesha change for setting the TIRPC conf. Though the default value for cont_recv_limit is 64, it gets reset to 0 by tirpc_control() once Ganesha starts.
If we continue to read from a single socket, this breaks this fairness guarantee
That is correct. That is why I have added this as conf param. Old behaviour can be achieved by setting cont_recv_limit as 0. Also, I am looking for suggestion(s) to keep this as fair as possible while still retaining the performance improvement.
@dang @ashishsangwan2 nfs-ganesha actually preferred to continue reading on active sockets, but as @dang noted, fairness was a problem in older ganesha. I think the configurable approach here is fine, but possibly the default coefficient of 64 is a bit large?
One thing we can do is return the number of executing + pending request to the caller of work_pool_submit() i.e. svc_vc_recv(). Based on that, we can decide should we continue recv OR should we yield.
Thanks @mattbenjamin for the reply. Would you please suggest in your opinion what could be a fair default value? Also, I would like to know your thoughts about returning pending requests as part of work_pool_submit() and if we could make it a factor for doing continuous recv(). For example if (pending_request < x*ioq_thread_max), continue doing recv. x could be 1 or 2
default value: maybe 4 or 8? It sounds helpful to consider pending_requests, but I'm not sure about the tuning. Maybe a fractional value would be safest.
Here's a quick summary of the performance numbers I got on my setup with different values of cont_recv_limit cont_recv_limit:SeqThroughput 0:740MB/sec 4:788MB/sec 8:800MB/sec 16:820MB/sec 24:840MB/sec 1000:900MB/sec As next step I will do these changes: a) Set default to 8. b) Also check the number of pending requests for taking the decision of continue receiving or not. Will make it 50% of the ioq_thread_max
Also of interest though, is affect on concurrent clients.
Also of interest though, is affect on concurrent clients.
@mattbenjamin I think that once we take the number of pending requests into account, than it will be like old behaviour under load scenarios originating from concurrent clients. I will be taking perf numbers with multiple clients doing parallel writes. If you could suggest any other particular test cases, I can try that.
Could you have some clients doing writes, and others doing something else, such as listing large directories? That's a nicely gated round-trip workload that could be easily starved.
@dang Ok, I am going to measure the directory listing time (in next week or so) for a directory with 100K entries while 0, 2, 4, 6 clients doing parallel sequential writes, with and without the patch.
I think we should abandon this one. Submitter has not gotten back to us.
Let's leave it for the moment, for future reference when we revisit queuing and whatnot.