headscale icon indicating copy to clipboard operation
headscale copied to clipboard

Proposal: Implement TS2021 (Tailscale control protocol v2)

Open juanfont opened this issue 2 years ago • 4 comments

Tailscale clients communicate with the control server using Tailscale's control protocol. This is what basically Headscale implements.

It is based on a HTTP API with a bit of Long Polling, and a grain of NaCl (https://nacl.cr.yp.to/) to encrypt the JSON payloads. Since 2019 the protocol has remained mostly stable - with just some extra fields being added to support new functionality like MagicDNS or Taildrop.

In our side the core of the implementation is located at api.go (where the registration methods are located) and poll.go (where lies the method that the clients use to receive updates).


A couple of weeks ago Tailscale team let us know (!) that they are working to implement the version 2 of the control protocol, codename TS2021.

They have also been kind enough to a) share some internal documentation on the implementation (!!), and b) release code in https://github.com/tailscale/tailscale that helps us A LOT with the implementation (!!!!!).

They did this in order not to break Headscale. Very very very big kudos to them!

About TS2021

TS2021 is a Noise-based protocol (https://noiseprotocol.org/noise.html), using the IK pattern (https://noiseprotocol.org/noise.html#interactive-handshake-patterns-fundamental). It is the same cryptographic framework as the one used for Signal or Whatsapp.

We will not have to deal with Noise too much. For us, the very first step is a POST call to /ts2021 and an upgrade + hijack of the TCP connection. Then the code I mentioned above quicks in, to create the Noise session. Once this is established, the API is reachable to the clients using what it looks like a H2C server (essentially just the good old v1 API, but without NaCl encryption for the payloads).

From what we can see, as of late March 2022 they have not yet fully migrated all the API methods to use TS2021. So we will have to follow them up gradually.

Our steps

  1. Prepare our API machinery (always wanted to use this word) to be able to deal with clients using TS2019 (no idea if they call it this way) and TS2021. This includes a minor change in the /key method, and removing NaCl for TS2021.

  2. Implement the /ts2021 handler (its quite similar to what we do for the embedded DERP server)

  3. Plug a H2C server to the Noise connection under /ts2021 to expose our current API.

  4. Keep track of their CurrentCapabilityVersion, gradually enabling new API calls under TS2021

Current status

I have a prototype mostly working. I will clean it a little bit an prepare a draft PR for scrutiny.

juanfont avatar Mar 26 '22 16:03 juanfont

One important thing to note on the cryptography side, which may not be in the docs you got (it was a later implementation question and I'm not sure I backported it into the specs): headscale must generate a new control key for use with Noise, it must not reuse the existing nacl keypair for Noise, even though the keys are technically cross-compatible (both curve25519 keypairs).

This is to avoid cryptographic problems with key reuse across multiple protocols (nacl and noise). Our expert tells us that clients can reuse the same machine key for both protocols (important for compatibility), as long as the control plane uses different keys for nacl and noise.

See https://controlplane.tailscale.com/key?v=27 for how new clients retrieve both keypairs.

Also, if you haven't already, I recommend using the control/controlbase and control/controlhttp packages in the tailscale repo to implement the transport, it takes care of a bunch of the subtleties of upgrading to Noise and handshaking safely. The server-side APIs are also included in those packages.

danderson avatar Mar 26 '22 23:03 danderson

Hey @danderson,

First, thank you so much for your message! Really appreciated!

And indeed, I was reusing the control key. Even left a comment wondering why you people where using two different keys.

image.

We mostly use controlbase and controlhttp, although I had to modify slightly AcceptHTTP to make Gin (the web framework we use) happy. I also found netutil.NewOneConnListener, which is quite convenient...

Again, thanks for your comment :)

juanfont avatar Mar 27 '22 00:03 juanfont

Could you @ me when the PR is up, so I can see the AcceptHTTP changes you needed? I'm wondering if we can fix it upstream without importing all of Gin.

danderson avatar Mar 27 '22 00:03 danderson

FYI, late change to the Noise protocol: https://github.com/tailscale/tailscale/pull/4370

We now use the client capability version as the Noise handshake version, instead of having a separate version for Noise. That means conn.Version() is the client capability version, and the server-side API changed a little bit to include the max supported protocol version, so the server can validate that it knows how to communicate correctly with a client. Aside from the API change, the controlbase/controlhttp packages handle the Noise internals for handshaking on the correct version, so hopefully not much difference as far as you're concerned.

danderson avatar Apr 07 '22 20:04 danderson

This is done :)

juanfont avatar Aug 23 '22 19:08 juanfont