grpc-go server: Stats handler and InTapHandle called during critical read path

Moving the service handler goroutine forking into the transport (from server.go) would allow other reads to proceed without blocking on the tap/stats handlers.

May 14 '19 19:05 dfawley

Anything below line 385, including stats handler, can easily be moved to another go routine: https://github.com/grpc/grpc-go/blob/master/internal/transport/http2_server.go#L385

Code between lines 358 and 385 does some checks and then saves the created stream into activeStreams map. It makes use of transport's mutex. So, moving them into another go routine risks lock contention.

InTapHandle code lives between lines 341 and 358 and doesn't use the mutex. If it is OK to run it after stream creation, then it can be moved into the new go routine. But its documentation says that InTapHandle should run before the stream is created.

This draft shows how the safe part can be moved into its own go routine: https://github.com/grpc/grpc-go/compare/master...canguler:a?expand=1

May 24 '19 03:05 canguler

I changed my draft code a bit more and ran benchmarks. New implementation (with forking a go routine) had ~5% worse QPS and latency results. Code here.

Benchmark results:

go run benchmark/benchmain/main.go -benchtime=10s -workloads=unary -compression=off -maxConcurrentCalls=100   -reqSizeBytes=1 -respSizeBytes=1 -networkMode=Local

Without this change:

Unary-traceMode_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_100-reqSize_1B-respSize_1B-Compressor_off-Preloader_false: 
50_Latency: 708.0560 µs 	90_Latency: 1323.9940 µs 	99_Latency: 1657.3950 µs 	Avg latency: 849.0950 µs 	Count: 1177079 	9450 Bytes/op	169 Allocs/op	
Histogram (unit: µs)
Count: 1177079  Min: 121.9  Max: 28079.0  Avg: 849.10
------------------------------------------------------------
[     121.905000,      121.906000)        1    0.0%    0.0%  
[     121.906000,      121.911720)        0    0.0%    0.0%  
[     121.911720,      121.950162)        0    0.0%    0.0%  
[     121.950162,      122.208504)        0    0.0%    0.0%  
[     122.208504,      123.944634)        0    0.0%    0.0%  
[     123.944634,      135.611937)        0    0.0%    0.0%  
[     135.611937,      214.019607)        6    0.0%    0.0%  
[     214.019607,      740.941973)   705278   59.9%   59.9%  ######
[     740.941973,     4282.013644)   468485   39.8%   99.7%  ####
[    4282.013644,    28079.046000)     3308    0.3%  100.0%  
[   28079.046000,             inf)        1    0.0%  100.0%  

Number of requests:  1177079	Request throughput:  941663.2 bit/s
Number of responses: 1177079	Response throughput: 941663.2 bit/s

With this change:

Unary-traceMode_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_100-reqSize_1B-respSize_1B-Compressor_off-Preloader_false: 
50_Latency: 730.0850 µs 	90_Latency: 1383.5280 µs 	99_Latency: 1969.0170 µs 	Avg latency: 896.3180 µs 	Count: 1115219 	9460 Bytes/op	169 Allocs/op	
Histogram (unit: µs)
Count: 1115219  Min: 272.7  Max: 28344.5  Avg: 896.32
------------------------------------------------------------
[     272.665000,      272.666000)        1    0.0%    0.0%  
[     272.666000,      272.671723)        0    0.0%    0.0%  
[     272.671723,      272.710203)        0    0.0%    0.0%  
[     272.710203,      272.968918)        0    0.0%    0.0%  
[     272.968918,      274.708349)        0    0.0%    0.0%  
[     274.708349,      286.403149)        0    0.0%    0.0%  
[     286.403149,      365.031371)       21    0.0%    0.0%  
[     365.031371,      893.676332)   827263   74.2%   74.2%  #######
[     893.676332,     4447.940831)   283711   25.4%   99.6%  ###
[    4447.940831,    28344.501000)     4222    0.4%  100.0%  
[   28344.501000,             inf)        1    0.0%  100.0%  

Number of requests:  1115219	Request throughput:  892175.2 bit/s
Number of responses: 1115219	Response throughput: 892175.2 bit/s

Jun 03 '19 17:06 canguler

ping @menghanl

Nov 07 '19 21:11 adtac