grpc-go Under 80% network utilization for LAN networks with large messages and high concurrency

go run benchmark/benchmain/main.go -benchtime=10s -workloads=streaming -compression=off -maxConcurrentCalls=1000 -trace=off -reqSizeBytes=1048576 -respSizeBytes=1048576 -networkMode=LAN -cpuProfile=speedup.cpu

Jan 04 '18 23:01 MakMukhi

It finishes for me, you just have to be patient:

$ go run benchmark/benchmain/main.go -benchtime=10s -workloads=streaming -compression=off -maxConcurrentCalls=200 -trace=off -reqSizeBytes=1048576 -respSizeBytes=1048576 -networkMode=LAN -cpuProfile=speedup.cpu
Stream-traceMode_false-latency_2ms-kbps_102400-MTU_1500-maxConcurrentCalls_200-reqSize_1048576B-respSize_1048576B-Compressor_false: 
50_Latency: 40.3235 s 	90_Latency: 41.0967 s 	99_Latency: 41.1228 s 	Avg latency: 36.3483 s 	Count: 200 	13201654 Bytes/op	16305 Allocs/op	
Histogram (unit: s)
Count: 200  Min:  14.5  *****Max:  41.1*****  Avg: 36.35
------------------------------------------------------------
[         14.456700,          14.456700)    1    0.5%    0.5%  
[         14.456700,          14.456700)    0    0.0%    0.5%  
[         14.456700,          14.456700)    0    0.0%    0.5%  
[         14.456700,          14.456703)    0    0.0%    0.5%  
[         14.456703,          14.456743)    0    0.0%    0.5%  
[         14.456743,          14.457320)    0    0.0%    0.5%  
[         14.457320,          14.465627)    0    0.0%    0.5%  
[         14.465627,          14.585274)    0    0.0%    0.5%  
[         14.585274,          16.308545)    3    1.5%    2.0%  
[         16.308545,          41.128673)  195   97.5%   99.5%  ##########
[         41.128673,                inf)    1    0.5%  100.0%

(Note the "Max" above, asterisked.)

1MB = 8Mb. 200 streams * 2 directions * 8Mb per message = 3200Mb. LAN mode allows 100Mbps. This means our best case scenario* would be 32s.

I'm not sure why we're 10s worse than optimal at this point (78% of max), but I don't see anything broken with the -networkMode flag. It's just that -benchTime=<short> is not really compatible with high levels of concurrency and large messages relative to the effective throughput.

- This is if we have perfect fairness across the outgoing client streams and all 200 requests complete at the same moment, with all 200 responses starting afterwards. Interestingly, the overall benchmark would improve if we sent the streams serially so that we could maximize the bi-directional utilization of the network.

Jan 05 '18 18:01 dfawley

Run on M2 macbook @master (1.59.0-dev).. So the results below are not comparable.

Ran it again to see what the diff is since the last comment

$ go run benchmark/benchmain/main.go -benchtime=10s -workloads=streaming -compression=off -maxConcurrentCalls=100
0 -trace=off -reqSizeBytes=1048576 -respSizeBytes=1048576 -networkMode=LAN -cpuProfile=speedup.cpu
go1.19.1/grpc1.59.0-dev
streaming-networkMode_LAN-bufConn_false-keepalive_false-benchTime_10s-trace_false-latency_2ms-kbps_102400-MTU_150
0-maxConcurrentCalls_1000-reqSize_1048576B-respSize_1048576B-compressor_off-channelz_false-preloader_false-client
ReadBufferSize_-1-clientWriteBufferSize_-1-serverReadBufferSize_-1-serverWriteBufferSize_-1-sleepBetweenRPCs_0s-c
onnections_1-recvBufferPool_nil-sharedWriteBuffer_false: 
50_Latency: 159.9492s   90_Latency: 160.0854s   99_Latency: 160.2036s   Avg_Latency: 158.2792s  Bytes/op: 8.57799
1e+06   Allocs/op: 10296.902
Histogram (unit: s)
Count: 1000  Min:  73.3  Max: 160.2  Avg: 158.28
------------------------------------------------------------
[          73.277045,           73.277045)     1    0.1%    0.1%  
[          73.277045,           73.277045)     0    0.0%    0.1%  
[          73.277045,           73.277045)     0    0.0%    0.1%  
[          73.277045,           73.277049)     0    0.0%    0.1%  
[          73.277049,           73.277118)     0    0.0%    0.1%  
[          73.277118,           73.278240)     0    0.0%    0.1%  
[          73.278240,           73.296671)     0    0.0%    0.1%  
[          73.296671,           73.599379)     0    0.0%    0.1%  
[          73.599379,           78.570988)     6    0.6%    0.7%  
[          78.570988,          160.223455)   992   99.2%   99.9%  ##########
[         160.223455,         1501.263413)     1    0.1%  100.0%  
Number of requests:  1000       Request throughput:  8.388608e+08 bit/s
Number of responses: 1000       Response throughput: 8.388608e+08 bit/s

Sep 27 '23 17:09 arvindbr8

grpc-go grpc-go copied to clipboard

Under 80% network utilization for LAN networks with large messages and high concurrency

grpc-go
grpc-go copied to clipboard