quinn icon indicating copy to clipboard operation
quinn copied to clipboard

Certificate rotation for long running clients

Open jean-airoldie opened this issue 6 years ago • 7 comments
trafficstars

Currently the server's certificate can only be set once at the creation of the server (AFAIK). This means that certificate rotation is basically impossible for long running servers without restarting the socket. This might be unacceptable in use cases where certificates are rotated quickly (few hours).

An alternative solution would be hitless rotation. The strategy involves periodically updating the server's certificate so that new connections use that new certificate and old connections continue using the old certificate.

Do you think that updating a Endpoint's rustls::ServerSession over time would achieve this effect?

jean-airoldie avatar Oct 14 '19 13:10 jean-airoldie

It would be the ServerConfig rather than the ServerSession, but basically, yes (this thing here: https://github.com/djc/quinn/blob/master/quinn-proto/src/shared.rs#L260). This would be a cool feature to have!

djc avatar Oct 14 '19 13:10 djc

Cool. Do you think exposing the proto::Endpoint::server_config via quinn::Endpoint::set_server_config would make sense?

https://github.com/djc/quinn/blob/89cd5d06e5b59b2b8774dfa5a1508e572b149f78/quinn-proto/src/endpoint.rs#L55

edit: Nevermind the Endpoint is already behind a mutex, so the RwLock would not be needed.

jean-airoldie avatar Oct 14 '19 13:10 jean-airoldie

I would investigate a bit what can be done with Arc itself (around the wider ServerConfig, not sure if that's a problem?), I think it has some facilities in this direction. But if that doesn't work out, a RwLock could definitely make sense. Exposing Endpoint::set_server_config() seems like the right approach, though beware the naming around ServerConfig vsCryptoSession::ServerConfig.

(In general it would probably be cool to update the wider ServerConfig and not just the crypto aspects of it, though.)

djc avatar Oct 14 '19 13:10 djc

Alright I'll take a more in dept look once I get the time.

jean-airoldie avatar Oct 14 '19 13:10 jean-airoldie

Arc::make_mut is relevant.

A more general solution is to place a routing service in front of your application that can be used to direct new connections to a new instance of your application. Highly-available services will need something like this regardless to support graceful upgrades. QUIC is designed to support this case gracefully by allowing data (e.g. a phase bit) to be encrypted into the local connection ID to coordinate with external routing systems. Quinn does not currently provide any way to do this, but it's something we'll want to explore eventually. I don't believe anybody's working on standardizing encoding of information into connection IDs, so it may take substantial effort (e.g. a custom-written load balancer) to deploy in practice.

Ralith avatar Oct 14 '19 20:10 Ralith

Arc::make_mut is relevant.

Cool, wasn't aware of this.

A more general solution is to place a routing service in front of your application that can be used to direct new connections to a new instance of your application. Highly-available services will need something like this regardless to support graceful upgrades.

I think this makes sense for updates that require application restart (OS updates or application update), but for certificates I'm not sold. If you run your own internal certificate authority and you emit short-lived certificates that you rotate often this would be a headache. But for web facing certificates which are usually long-lived that wouldn't be an issue.

Moreover for gracefull updates of long running applications you still need to migrate your connected peers to the new server. For instance you would have to emit a notification telling the peer to connect to the new connection, maybe finish processing the pending request and then gracefully shutdown.

QUIC is designed to support this case gracefully by allowing data (e.g. a phase bit) to be encrypted into the local connection ID to coordinate with external routing systems.

I'm afraid I'm not following. How would this phase bit be of use?

jean-airoldie avatar Oct 15 '19 00:10 jean-airoldie

for certificates I'm not sold

Yeah, it's a big hammer. I'm not opposed to being able to update things live. Be aware that there are subtle 0-RTT correctness implications to changing some configuration parameters live; replacing the crypto configuration is probably fine (though we should make sure), but allowing modification of the TransportConfig will require taking care to reject 0-RTT data associated with an incompatible configuration.

I'm afraid I'm not following. How would this phase bit be of use?

To allow a stateless packet router to direct packets to the instance associated with their connection. Simple stateful routing based on storing connection IDs does not work because connections may change ID unpredictably.

Ralith avatar Oct 15 '19 01:10 Ralith