iroh icon indicating copy to clipboard operation
iroh copied to clipboard

perf: simplify relay handshake

Open dignifiedquire opened this issue 10 months ago • 2 comments

The relay handshake is quite expensive currently

  • TLS 1.3 + HTTP 1 UPGRADE
  • Server sends FrameType::ServerKey
  • Client sends FrameType::ClientInfo
  • Server sends FrameType::ServerInfo

This simplifies the protocol to

  • TLS 1.3 + HTTP 1 UPGRADE
  • Client sends FrameType::ClientInfo

information changes

  • using a signature, instead of a shared secret to ensure the client identity.
  • server key is not sent (use certificate pinning if fixed relays are important)
  • remove unused configuration in the info frames
  • change magic from derp to relay
  • increase protocol version to 3
  • drop serverinfo in favor of fixed config on the client
  • enable tcp nodelay
  • switch to AES for encryption in favor over chacha
  • the server simply aborts when the client version doesn't match (matching what we do in other protocols)

Benchmarks:

https://gist.github.com/dignifiedquire/30131dbe8b87fb799a971068899656ef

dignifiedquire avatar Apr 09 '24 10:04 dignifiedquire

Haven't done a read-through yet, but just looking at the protocol changes: we haven't been using rate limiters for the client yet, but we should talk about what we might do to replace it in the future if we no longer have a ServerInfo. Or is this something we can get away with never doing?

ramfox avatar Apr 09 '24 14:04 ramfox

we haven't been using rate limiters for the client yet, but we should talk about what we might do to replace it in the future if we no longer have a ServerInfo. Or is this something we can get away with never doing?

I think we should communicate those expectations out of band. Eg have a default limiter in the client that we expect clients to use. If more details are needed we can add config options to specificy this per der relay on the client side.

dignifiedquire avatar Apr 09 '24 14:04 dignifiedquire

I just looked at the benchmarks and did not see any breathtaking improvements. Things seemed pretty much unchanged or even a bit worse. Is that expected?

Anyway, eliminating an entire roundtrip seems like a big win in any case, even if it does not show up in the benches.

rklaehn avatar Apr 11 '24 05:04 rklaehn

in US case we drop around 200ms by the end for connection, do not sure what you mean

dignifiedquire avatar Apr 11 '24 06:04 dignifiedquire

in US case we drop around 200ms by the end for connection, do not sure what you mean

Probably should not comment without first ☕ ... I looked at the latency numbers.

rklaehn avatar Apr 11 '24 06:04 rklaehn

In changelog it is marked as performance, should probably be somewhere at the top of changelog saying it is a breaking change. We were running 0.13 relay, but 0.14 clients failed to connect until we upgraded the relay to 0.14 as well.

link2xt avatar Apr 17 '24 19:04 link2xt

BREAKING: This breaks the relay handshake, and so requires updated relay nodes to work with.

from the description above 😅

dignifiedquire avatar Apr 17 '24 20:04 dignifiedquire

BREAKING: This breaks the relay handshake, and so requires updated relay nodes to work with.

from the description above 😅

Yes, I have seen it, but this probably should should be on top of the https://github.com/n0-computer/iroh/blob/main/CHANGELOG.md entry

link2xt avatar Apr 17 '24 20:04 link2xt