grpc
grpc copied to clipboard
Bidirectional streaming performance with 1MB payload
Describe the question
I'm running a quick performance test using the included routeguide service and finding that it takes around 30 seconds to send and receive 100 messages using the route_chat rpc method if the msg field of each message the client sends contains 1MB of random bytes, using the loopback interface on one machine. The server is returning an empty message.
Is this the expected performance and is there anything I can do (perhaps changing some cowboy configuration) to improve it?
Versions:
- OS: NixOS 19.09
- Erlang: erts-10.5.5/OTP 22
- Elixir: 1.10.1
- mix.lock(grpc, gun, cowboy, cowlib):
%{
"cowboy": {:hex, :cowboy, "2.7.0", "91ed100138a764355f43316b1d23d7ff6bdb0de4ea618cb5d8677c93a7a2f115", [:rebar3], [{:cowlib, "~> 2.8.0", [hex: :cowlib, repo: "hexpm", optional: false]}, {:ranch, "~> 1.7.1", [hex: :ranch, repo: "hexpm", optional: false]}], "hexpm", "04fd8c6a39edc6aaa9c26123009200fc61f92a3a94f3178c527b70b767c6e605"},
"cowlib": {:hex, :cowlib, "2.9.1", "61a6c7c50cf07fdd24b2f45b89500bb93b6686579b069a89f88cb211e1125c78", [:rebar3], [], "hexpm", "e4175dc240a70d996156160891e1c62238ede1729e45740bdd38064dad476170"},
"grpc": {:hex, :grpc, "0.5.0-beta.1", "7d43f52e138fe261f5b4981f1ada515dfc2e1bfa9dc92c7022e8f41e7e49b571", [:mix], [{:cowboy, "~> 2.7.0", [hex: :cowboy, repo: "hexpm", optional: false]}, {:gun, "~> 2.0.0", [hex: :grpc_gun, repo: "hexpm", optional: false]}, {:protobuf, "~> 0.5", [hex: :protobuf, repo: "hexpm", optional: false]}], "hexpm", "fbbf8872935c295b7575435fe4128372c23c6ded89c2ef8058af3c6167bb3f65"},
"gun": {:hex, :grpc_gun, "2.0.0", "f99678a2ab975e74372a756c86ec30a8384d3ac8a8b86c7ed6243ef4e61d2729", [:rebar3], [{:cowlib, "~> 2.8.0", [hex: :cowlib, repo: "hexpm", optional: false]}], "hexpm", "03dbbca1a9c604a0267a40ea1d69986225091acb822de0b2dbea21d5815e410b"}
}
Test script
count = 100
size = 1024*1024
data =
Enum.map(1..count, fn _ ->
msg = :crypto.strong_rand_bytes(size)
%{lat: 0, long: 0, msg: msg}
end)
notes =
Enum.map(data, fn %{lat: lat, long: long, msg: msg} ->
point = Routeguide.Point.new(latitude: lat, longitude: long)
Routeguide.RouteNote.new(location: point, message: msg)
end)
{:ok, channel} = GRPC.Stub.connect("localhost:50051")
stream = channel |> Routeguide.RouteGuide.Stub.route_chat()
start_time = :erlang.monotonic_time()
task =
Task.async(fn ->
Enum.reduce(notes, notes, fn _, [note | tail] ->
opts = if length(tail) == 0, do: [end_stream: true], else: []
GRPC.Stub.send_request(stream, note, opts)
tail
end)
end)
{:ok, result_enum} = GRPC.Stub.recv(stream, timeout: :infinity)
Task.await(task)
count = Enum.reduce(result_enum, 0, fn {:ok, _}, acc ->
acc + 1
end)
end_time = :erlang.monotonic_time()
IO.puts(
"received #{count} responses in #{:erlang.convert_time_unit(end_time - start_time, :native, :millisecond)} ms"
)
A couple of extra details:
- Decreasing the payload size to 64 KB results in 175ms send-and-receive time. Thats a reduction of 16x in size resulting in a 175x speed increase
- I've tried increasing the flow control parameters for cowboy but it didn't seem to improve things. This is the
optsproplist I'm using for the server:
grpc_opts = [
initial_connection_window_size: 2_147_483_647,
initial_stream_window_size: 2_147_483_647,
max_connection_window_size: 2_147_483_647,
max_connection_buffer_size: 1024 * 1024 * 1024,
max_stream_window_size: 2_147_483_647,
max_stream_buffer_size: 1024 * 1024 * 1024,
max_frame_size_received: 16777215,
max_frame_size_sent: :infinity,
max_received_frame_rate: {999_000_000, 100},
max_reset_stream_rate: {999_000_000, 100}
]
- Using
observer_cliit appears that most of the time is taken by a single cowboy process started usingcowboy_clear:connection_process/4. An investigation of the execution stack of that process indicates that the functionscowboy_http2:loopandcowboy_http:parseare taking a large portion of the time spent processing the messages on the server side. - I ran a similar test using the standard gRPC implementation in Go - it takes about 180ms. Thats about 160x faster.
This needs some more investigation, especially after the updates in #224