grpc-go
grpc-go copied to clipboard
Under 80% network utilization for LAN networks with large messages and high concurrency
go run benchmark/benchmain/main.go -benchtime=10s -workloads=streaming -compression=off -maxConcurrentCalls=1000 -trace=off -reqSizeBytes=1048576 -respSizeBytes=1048576 -networkMode=LAN -cpuProfile=speedup.cpu
It finishes for me, you just have to be patient:
$ go run benchmark/benchmain/main.go -benchtime=10s -workloads=streaming -compression=off -maxConcurrentCalls=200 -trace=off -reqSizeBytes=1048576 -respSizeBytes=1048576 -networkMode=LAN -cpuProfile=speedup.cpu
Stream-traceMode_false-latency_2ms-kbps_102400-MTU_1500-maxConcurrentCalls_200-reqSize_1048576B-respSize_1048576B-Compressor_false:
50_Latency: 40.3235 s 90_Latency: 41.0967 s 99_Latency: 41.1228 s Avg latency: 36.3483 s Count: 200 13201654 Bytes/op 16305 Allocs/op
Histogram (unit: s)
Count: 200 Min: 14.5 *****Max: 41.1***** Avg: 36.35
------------------------------------------------------------
[ 14.456700, 14.456700) 1 0.5% 0.5%
[ 14.456700, 14.456700) 0 0.0% 0.5%
[ 14.456700, 14.456700) 0 0.0% 0.5%
[ 14.456700, 14.456703) 0 0.0% 0.5%
[ 14.456703, 14.456743) 0 0.0% 0.5%
[ 14.456743, 14.457320) 0 0.0% 0.5%
[ 14.457320, 14.465627) 0 0.0% 0.5%
[ 14.465627, 14.585274) 0 0.0% 0.5%
[ 14.585274, 16.308545) 3 1.5% 2.0%
[ 16.308545, 41.128673) 195 97.5% 99.5% ##########
[ 41.128673, inf) 1 0.5% 100.0%
(Note the "Max" above, asterisked.)
1MB = 8Mb. 200 streams * 2 directions * 8Mb per message = 3200Mb. LAN mode allows 100Mbps. This means our best case scenario* would be 32s.
I'm not sure why we're 10s worse than optimal at this point (78% of max), but I don't see anything broken with the -networkMode
flag. It's just that -benchTime=<short>
is not really compatible with high levels of concurrency and large messages relative to the effective throughput.
-
- This is if we have perfect fairness across the outgoing client streams and all 200 requests complete at the same moment, with all 200 responses starting afterwards. Interestingly, the overall benchmark would improve if we sent the streams serially so that we could maximize the bi-directional utilization of the network.
Run on M2 macbook @master (1.59.0-dev).. So the results below are not comparable.
Ran it again to see what the diff is since the last comment
$ go run benchmark/benchmain/main.go -benchtime=10s -workloads=streaming -compression=off -maxConcurrentCalls=100
0 -trace=off -reqSizeBytes=1048576 -respSizeBytes=1048576 -networkMode=LAN -cpuProfile=speedup.cpu
go1.19.1/grpc1.59.0-dev
streaming-networkMode_LAN-bufConn_false-keepalive_false-benchTime_10s-trace_false-latency_2ms-kbps_102400-MTU_150
0-maxConcurrentCalls_1000-reqSize_1048576B-respSize_1048576B-compressor_off-channelz_false-preloader_false-client
ReadBufferSize_-1-clientWriteBufferSize_-1-serverReadBufferSize_-1-serverWriteBufferSize_-1-sleepBetweenRPCs_0s-c
onnections_1-recvBufferPool_nil-sharedWriteBuffer_false:
50_Latency: 159.9492s 90_Latency: 160.0854s 99_Latency: 160.2036s Avg_Latency: 158.2792s Bytes/op: 8.57799
1e+06 Allocs/op: 10296.902
Histogram (unit: s)
Count: 1000 Min: 73.3 Max: 160.2 Avg: 158.28
------------------------------------------------------------
[ 73.277045, 73.277045) 1 0.1% 0.1%
[ 73.277045, 73.277045) 0 0.0% 0.1%
[ 73.277045, 73.277045) 0 0.0% 0.1%
[ 73.277045, 73.277049) 0 0.0% 0.1%
[ 73.277049, 73.277118) 0 0.0% 0.1%
[ 73.277118, 73.278240) 0 0.0% 0.1%
[ 73.278240, 73.296671) 0 0.0% 0.1%
[ 73.296671, 73.599379) 0 0.0% 0.1%
[ 73.599379, 78.570988) 6 0.6% 0.7%
[ 78.570988, 160.223455) 992 99.2% 99.9% ##########
[ 160.223455, 1501.263413) 1 0.1% 100.0%
Number of requests: 1000 Request throughput: 8.388608e+08 bit/s
Number of responses: 1000 Response throughput: 8.388608e+08 bit/s