iroh
iroh copied to clipboard
perf: simplify relay handshake
The relay handshake is quite expensive currently
- TLS 1.3 + HTTP 1 UPGRADE
- Server sends FrameType::ServerKey
- Client sends FrameType::ClientInfo
- Server sends FrameType::ServerInfo
This simplifies the protocol to
- TLS 1.3 + HTTP 1 UPGRADE
- Client sends FrameType::ClientInfo
information changes
- using a signature, instead of a shared secret to ensure the client identity.
- server key is not sent (use certificate pinning if fixed relays are important)
- remove unused configuration in the info frames
- change magic from
derp
torelay
- increase protocol version to
3
- drop
serverinfo
in favor of fixed config on the client - enable tcp nodelay
- switch to AES for encryption in favor over chacha
- the server simply aborts when the client version doesn't match (matching what we do in other protocols)
Benchmarks:
https://gist.github.com/dignifiedquire/30131dbe8b87fb799a971068899656ef
Haven't done a read-through yet, but just looking at the protocol changes: we haven't been using rate limiters for the client yet, but we should talk about what we might do to replace it in the future if we no longer have a ServerInfo
. Or is this something we can get away with never doing?
we haven't been using rate limiters for the client yet, but we should talk about what we might do to replace it in the future if we no longer have a
ServerInfo
. Or is this something we can get away with never doing?
I think we should communicate those expectations out of band. Eg have a default limiter in the client that we expect clients to use. If more details are needed we can add config options to specificy this per der relay on the client side.
I just looked at the benchmarks and did not see any breathtaking improvements. Things seemed pretty much unchanged or even a bit worse. Is that expected?
Anyway, eliminating an entire roundtrip seems like a big win in any case, even if it does not show up in the benches.
in US case we drop around 200ms by the end for connection, do not sure what you mean
in US case we drop around 200ms by the end for connection, do not sure what you mean
Probably should not comment without first ☕ ... I looked at the latency numbers.
In changelog it is marked as performance, should probably be somewhere at the top of changelog saying it is a breaking change. We were running 0.13 relay, but 0.14 clients failed to connect until we upgraded the relay to 0.14 as well.
BREAKING: This breaks the relay handshake, and so requires updated relay nodes to work with.
from the description above 😅
BREAKING: This breaks the relay handshake, and so requires updated relay nodes to work with.
from the description above 😅
Yes, I have seen it, but this probably should should be on top of the https://github.com/n0-computer/iroh/blob/main/CHANGELOG.md entry