bebop icon indicating copy to clipboard operation
bebop copied to clipboard

Rpc rust benchmarks

Open mattiekat opened this issue 3 years ago • 0 comments

With a 1-thread Tokio multi-thread runtime in a VM thread on a 3950x (each number is round-trip on its own, so send means sending only and getting back an ack, receive is making a request and getting the obj to deserialize):

Performing a simple RTT with a "void ping()" function (which does have an empty response OK datagram) monodirectional 2 at once: 26.7 Kelem/s & 78us latency bidirectional 2 at once: 25.7 Kelem/s & 77us latency monodirectional 16 at once: 39.3 Kelem/s & 407us latency bidirectional 16 at once: 56.3 Kelem/s & 284us latency

A is a simple fixed-sized struct B is a Message obj C is a Large-ish obj (not fixed sized, and contains A)

A send/receive: 55us/54us B send/receive: 59us/56us C send/receive: 55us/60us

Serializing and deserializing A over a tokio channel (1x/1000x) 325ns/223ns Serializing and deserializing B over a tokio channel (1x/1000x) 449ns/319ns Serializing and deserializing C over a tokio channel (1x/1000x) 484ns/359ns

So it's safe to say that we are seeing a fair amount of overhead with the service wrapping since the transport and serialization is only 0.6% of the time or something like that. Some of that is the call table, though, which we are able to bypass in the direct transport test. If we think 0.05ms is too slow we can generate a flamegraph which I haven't done yet because I am not setup on Linux at the moment.

The transport test also has 1 less alloc because it does not make an owned object again, but there is no real way to avoid this in real-world. So the transport test is the lower bound in more ways than one.

Another * here is that the tests are all being done as a single RTT and don't account for possible performance gains from having multiple in flight at once (we did see throughput improvements for the 1000x channel sending AND from parallel pings). (With the exception of the 1000x channel tests which uses producer + consumer to keep them both going at max rate)

mattiekat avatar Mar 29 '22 17:03 mattiekat