grpc Bidirectional streaming performance with 1MB payload

Describe the question I'm running a quick performance test using the included routeguide service and finding that it takes around 30 seconds to send and receive 100 messages using the route_chat rpc method if the msg field of each message the client sends contains 1MB of random bytes, using the loopback interface on one machine. The server is returning an empty message.

Is this the expected performance and is there anything I can do (perhaps changing some cowboy configuration) to improve it?

Versions:

OS: NixOS 19.09
Erlang: erts-10.5.5/OTP 22
Elixir: 1.10.1
mix.lock(grpc, gun, cowboy, cowlib):

%{
"cowboy": {:hex, :cowboy, "2.7.0", "91ed100138a764355f43316b1d23d7ff6bdb0de4ea618cb5d8677c93a7a2f115", [:rebar3], [{:cowlib, "~> 2.8.0", [hex: :cowlib, repo: "hexpm", optional: false]}, {:ranch, "~> 1.7.1", [hex: :ranch, repo: "hexpm", optional: false]}], "hexpm", "04fd8c6a39edc6aaa9c26123009200fc61f92a3a94f3178c527b70b767c6e605"},
  "cowlib": {:hex, :cowlib, "2.9.1", "61a6c7c50cf07fdd24b2f45b89500bb93b6686579b069a89f88cb211e1125c78", [:rebar3], [], "hexpm", "e4175dc240a70d996156160891e1c62238ede1729e45740bdd38064dad476170"},
  "grpc": {:hex, :grpc, "0.5.0-beta.1", "7d43f52e138fe261f5b4981f1ada515dfc2e1bfa9dc92c7022e8f41e7e49b571", [:mix], [{:cowboy, "~> 2.7.0", [hex: :cowboy, repo: "hexpm", optional: false]}, {:gun, "~> 2.0.0", [hex: :grpc_gun, repo: "hexpm", optional: false]}, {:protobuf, "~> 0.5", [hex: :protobuf, repo: "hexpm", optional: false]}], "hexpm", "fbbf8872935c295b7575435fe4128372c23c6ded89c2ef8058af3c6167bb3f65"},
  "gun": {:hex, :grpc_gun, "2.0.0", "f99678a2ab975e74372a756c86ec30a8384d3ac8a8b86c7ed6243ef4e61d2729", [:rebar3], [{:cowlib, "~> 2.8.0", [hex: :cowlib, repo: "hexpm", optional: false]}], "hexpm", "03dbbca1a9c604a0267a40ea1d69986225091acb822de0b2dbea21d5815e410b"}
}

Test script

count = 100
size = 1024*1024

 data =
   Enum.map(1..count, fn _ ->
     msg = :crypto.strong_rand_bytes(size)
     %{lat: 0, long: 0, msg: msg}
   end)

 notes =
   Enum.map(data, fn %{lat: lat, long: long, msg: msg} ->
     point = Routeguide.Point.new(latitude: lat, longitude: long)
     Routeguide.RouteNote.new(location: point, message: msg)
   end)

 {:ok, channel} = GRPC.Stub.connect("localhost:50051")
 stream = channel |> Routeguide.RouteGuide.Stub.route_chat()

 start_time = :erlang.monotonic_time()

 task =
   Task.async(fn ->
     Enum.reduce(notes, notes, fn _, [note | tail] ->
       opts = if length(tail) == 0, do: [end_stream: true], else: []
       GRPC.Stub.send_request(stream, note, opts)
       tail
     end)
   end)

 {:ok, result_enum} = GRPC.Stub.recv(stream, timeout: :infinity)
 Task.await(task)

 count = Enum.reduce(result_enum, 0, fn {:ok, _}, acc ->
   acc + 1
 end)

 end_time = :erlang.monotonic_time()

 IO.puts(
   "received #{count} responses in #{:erlang.convert_time_unit(end_time - start_time, :native, :millisecond)} ms"
 )

Aug 03 '20 01:08 ljmarks

A couple of extra details:

Decreasing the payload size to 64 KB results in 175ms send-and-receive time. Thats a reduction of 16x in size resulting in a 175x speed increase
I've tried increasing the flow control parameters for cowboy but it didn't seem to improve things. This is the opts proplist I'm using for the server:

     grpc_opts = [
       initial_connection_window_size: 2_147_483_647,
       initial_stream_window_size: 2_147_483_647,
       max_connection_window_size: 2_147_483_647,
       max_connection_buffer_size: 1024 * 1024 * 1024,
       max_stream_window_size: 2_147_483_647,
       max_stream_buffer_size: 1024 * 1024 * 1024,
       max_frame_size_received: 16777215,
       max_frame_size_sent: :infinity,
       max_received_frame_rate: {999_000_000, 100},
       max_reset_stream_rate: {999_000_000, 100}
     ]

Using observer_cli it appears that most of the time is taken by a single cowboy process started using cowboy_clear:connection_process/4. An investigation of the execution stack of that process indicates that the functions cowboy_http2:loop and cowboy_http:parse are taking a large portion of the time spent processing the messages on the server side.
I ran a similar test using the standard gRPC implementation in Go - it takes about 180ms. Thats about 160x faster.

Aug 03 '20 02:08 ljmarks

This needs some more investigation, especially after the updates in #224

Jul 18 '22 08:07 polvalente

grpc grpc copied to clipboard

Bidirectional streaming performance with 1MB payload

grpc
grpc copied to clipboard