etcd
etcd copied to clipboard
MAX_CONCURRENT_STREAMS not set correctly under secure server
The server in embed has two patterns, secure and insecure, when insecure is used, it's a pure GRPC implement which uses Serve method in grpc implement while it use ServeHTTP when in secure pattern.
From the comment of grpc-go, ServeHTTP doen't support grpc features and may perform lower efficiency than Serve.
When GODEBUG=http2debug=2 is enabled when using etcdctl, we can see that MAX_CONCURRENT_STREAMS behaves differently between two methods mentioned.
Here is the output when get insecure endpoint, the server implement is grpc.Serve
GODEBUG=http2debug=2 etcdctl get --prefix "" --keys-only
2022/02/21 14:58:53 http2: Framer 0xc0000c80e0: wrote SETTINGS len=0
2022/02/21 14:58:53 http2: Framer 0xc0000c80e0: read SETTINGS len=12, settings: MAX_FRAME_SIZE=16384, MAX_CONCURRENT_STREAMS=4294967295
2022/02/21 14:58:53 http2: Framer 0xc0000c80e0: read SETTINGS flags=ACK len=0
2022/02/21 14:58:53 http2: Framer 0xc0000c80e0: wrote SETTINGS flags=ACK len=0
...
And here is the output of secure endpoint, server implement is grpc.ServeHttp
2022/02/21 15:26:16 http2: Framer 0xc000448540: wrote SETTINGS len=0
2022/02/21 15:26:16 http2: Framer 0xc000448540: read SETTINGS len=24, settings: MAX_FRAME_SIZE=1048576, MAX_CONCURRENT_STREAMS=250, MAX_HEADER_LIST_SIZE=1048896, INITIAL_WINDOW_SIZE=1048576
2022/02/21 15:26:16 http2: Framer 0xc000448540: read WINDOW_UPDATE len=4 (conn) incr=983041
2022/02/21 15:26:16 http2: Framer 0xc000448540: read SETTINGS flags=ACK len=0
2022/02/21 15:26:16 http2: Framer 0xc000448540: wrote SETTINGS flags=ACK len=0
...
Apparently the main difference is MAX_CONCURRENT_STREAMS, value 250 is the go default setting of http2,4294967295 is math.MaxUint32 we set, it's set correctly in inseure pattern.
And I tried to run another server using grpc.Serve in secure pattern and made a benchmark, it has 20% improvement of throughput than current server implement.
It seems that for grpc-gateway compatibility and secure reason, we wraped a grpc handler, but it's less efficient than pure grpc. Shall we refactor current implement or add another listening address on pure grpc for high performance?
cc
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.