grpc icon indicating copy to clipboard operation
grpc copied to clipboard

Bidirectional streaming performance with 1MB payload

Open ljmarks opened this issue 5 years ago • 2 comments

Describe the question I'm running a quick performance test using the included routeguide service and finding that it takes around 30 seconds to send and receive 100 messages using the route_chat rpc method if the msg field of each message the client sends contains 1MB of random bytes, using the loopback interface on one machine. The server is returning an empty message.

Is this the expected performance and is there anything I can do (perhaps changing some cowboy configuration) to improve it?

Versions:

  • OS: NixOS 19.09
  • Erlang: erts-10.5.5/OTP 22
  • Elixir: 1.10.1
  • mix.lock(grpc, gun, cowboy, cowlib):
%{
"cowboy": {:hex, :cowboy, "2.7.0", "91ed100138a764355f43316b1d23d7ff6bdb0de4ea618cb5d8677c93a7a2f115", [:rebar3], [{:cowlib, "~> 2.8.0", [hex: :cowlib, repo: "hexpm", optional: false]}, {:ranch, "~> 1.7.1", [hex: :ranch, repo: "hexpm", optional: false]}], "hexpm", "04fd8c6a39edc6aaa9c26123009200fc61f92a3a94f3178c527b70b767c6e605"},
  "cowlib": {:hex, :cowlib, "2.9.1", "61a6c7c50cf07fdd24b2f45b89500bb93b6686579b069a89f88cb211e1125c78", [:rebar3], [], "hexpm", "e4175dc240a70d996156160891e1c62238ede1729e45740bdd38064dad476170"},
  "grpc": {:hex, :grpc, "0.5.0-beta.1", "7d43f52e138fe261f5b4981f1ada515dfc2e1bfa9dc92c7022e8f41e7e49b571", [:mix], [{:cowboy, "~> 2.7.0", [hex: :cowboy, repo: "hexpm", optional: false]}, {:gun, "~> 2.0.0", [hex: :grpc_gun, repo: "hexpm", optional: false]}, {:protobuf, "~> 0.5", [hex: :protobuf, repo: "hexpm", optional: false]}], "hexpm", "fbbf8872935c295b7575435fe4128372c23c6ded89c2ef8058af3c6167bb3f65"},
  "gun": {:hex, :grpc_gun, "2.0.0", "f99678a2ab975e74372a756c86ec30a8384d3ac8a8b86c7ed6243ef4e61d2729", [:rebar3], [{:cowlib, "~> 2.8.0", [hex: :cowlib, repo: "hexpm", optional: false]}], "hexpm", "03dbbca1a9c604a0267a40ea1d69986225091acb822de0b2dbea21d5815e410b"}
}

Test script

count = 100
size = 1024*1024

 data =
   Enum.map(1..count, fn _ ->
     msg = :crypto.strong_rand_bytes(size)
     %{lat: 0, long: 0, msg: msg}
   end)

 notes =
   Enum.map(data, fn %{lat: lat, long: long, msg: msg} ->
     point = Routeguide.Point.new(latitude: lat, longitude: long)
     Routeguide.RouteNote.new(location: point, message: msg)
   end)

 {:ok, channel} = GRPC.Stub.connect("localhost:50051")
 stream = channel |> Routeguide.RouteGuide.Stub.route_chat()

 start_time = :erlang.monotonic_time()

 task =
   Task.async(fn ->
     Enum.reduce(notes, notes, fn _, [note | tail] ->
       opts = if length(tail) == 0, do: [end_stream: true], else: []
       GRPC.Stub.send_request(stream, note, opts)
       tail
     end)
   end)

 {:ok, result_enum} = GRPC.Stub.recv(stream, timeout: :infinity)
 Task.await(task)

 count = Enum.reduce(result_enum, 0, fn {:ok, _}, acc ->
   acc + 1
 end)

 end_time = :erlang.monotonic_time()

 IO.puts(
   "received #{count} responses in #{:erlang.convert_time_unit(end_time - start_time, :native, :millisecond)} ms"
 )

ljmarks avatar Aug 03 '20 01:08 ljmarks

A couple of extra details:

  • Decreasing the payload size to 64 KB results in 175ms send-and-receive time. Thats a reduction of 16x in size resulting in a 175x speed increase
  • I've tried increasing the flow control parameters for cowboy but it didn't seem to improve things. This is the opts proplist I'm using for the server:
     grpc_opts = [
       initial_connection_window_size: 2_147_483_647,
       initial_stream_window_size: 2_147_483_647,
       max_connection_window_size: 2_147_483_647,
       max_connection_buffer_size: 1024 * 1024 * 1024,
       max_stream_window_size: 2_147_483_647,
       max_stream_buffer_size: 1024 * 1024 * 1024,
       max_frame_size_received: 16777215,
       max_frame_size_sent: :infinity,
       max_received_frame_rate: {999_000_000, 100},
       max_reset_stream_rate: {999_000_000, 100}
     ]
  • Using observer_cli it appears that most of the time is taken by a single cowboy process started using cowboy_clear:connection_process/4. An investigation of the execution stack of that process indicates that the functions cowboy_http2:loop and cowboy_http:parse are taking a large portion of the time spent processing the messages on the server side.
  • I ran a similar test using the standard gRPC implementation in Go - it takes about 180ms. Thats about 160x faster.

ljmarks avatar Aug 03 '20 02:08 ljmarks

This needs some more investigation, especially after the updates in #224

polvalente avatar Jul 18 '22 08:07 polvalente