go
go copied to clipboard
proposal: crypto/tls: support kernel-provided TLS
Lots of background and a implementation, albeit from 3+ years ago: https://blog.filippo.io/playing-with-kernel-tls-in-linux-4-13-and-go/
Basically, Linux now supports handling TLS encryption in the kernel. The primary benefit here is the possibility of sendfile
/splice
to work with TLS. Currently, we need to choose between TLS and splice
(or a custom TLS implementation, I suppose).
It would be great to have first class support in go for this.
cc @FiloSottile
I would love to have this happen as well! It's a major use case for L7 load balancers written in golang, and could transparently provide significant performance boosts for a lot of systems (including Kubernetes)
Can we get some benchmarks and numbers for the performance improvement? My patch linked above might be a good starting point. It's a lot of complexity and it would have to be justified by very good numbers.
Hi, all
I have updated kernel tls support based on @FiloSottile's original code. It now supports more ciphers like AES_GCM_256, AES_CCM_128 and CHACHA20_POLY1305.
Code: https://github.com/jim3ma/go/tree/dev.ktls.1.16.3.
And I have fixed some kernel issues when in coding: https://github.com/torvalds/linux/commit/974271e5ed45cfe4daddbeb16224a2156918530e, https://github.com/torvalds/linux/commit/d8654f4f9300e5e7cf8d5e7885978541cf61326b
In my simple tests, when enable kernel tls, I have got 30% time cost decreased.
I made some real-world tests with one of our internal applications (CDN node specialised in delivering video segments for DASH and HLS streams).
- Kernel 5.13.12
- Curve: prime256v1
I compared https vs http, vs http + sendfile and ktls + sendfile.
Most of the TLS stuff is working, except TLS 1.3 with Chrome and k6. k6 reports tls: oversized record received with length 62464
.
With ktls, the latency is increased - but this can also be related to the difference in the used Go-Versions.
The ktls implementation reduces overall CPU usage, around 10%. We'll deploy the Nvidia ConnectX-6 (200 Gbit/s) in our latest hardware setup, and we hope we can use the TLS NIC offloading in the future.
https://docs.google.com/spreadsheets/d/1XaiFczae9GLixu__8y2kuKPsw7RGqW9vMDkYxuTLx28/edit#gid=0
@totallyunknown If the latency issue is related to the kernel implementation (rule out golang side) we can take a look at kernel side improvements. We've been using the openssl implementation lately so I'll check there as well, but I don't recall extra latency last time I did metrics. Having a golang implementation would be very useful on my side as well. fwiw I'm one of the ktls maintainers on kernel side so we shouldn't have trouble getting improvements there as needed and happy to help where I can to get this moving forward.
I made some real-world tests with one of our internal applications (CDN node specialised in delivering video segments for DASH and HLS streams).
- Kernel 5.13.12
- Curve: prime256v1
I compared https vs http, vs http + sendfile and ktls + sendfile.
Most of the TLS stuff is working, except TLS 1.3 with Chrome and k6. k6 reports
tls: oversized record received with length 62464
.With ktls, the latency is increased - but this can also be related to the difference in the used Go-Versions.
The ktls implementation reduces overall CPU usage, around 10%. We'll deploy the Nvidia ConnectX-6 (200 Gbit/s) in our latest hardware setup, and we hope we can use the TLS NIC offloading in the future.
https://docs.google.com/spreadsheets/d/1XaiFczae9GLixu__8y2kuKPsw7RGqW9vMDkYxuTLx28/edit#gid=0
Which version do you test ? I have update some go code for http with ktls.
@jim3ma Your branch: https://github.com/jim3ma/go/tree/dev.ktls.1.16.3
@jim3ma Your branch: https://github.com/jim3ma/go/tree/dev.ktls.1.16.3
Okay, I will merge some optimized code into this branch tomorrow.
Excuse me, how is the implementation going?
hi, this is such a long awaited feature coz crypto tls is so much slower. pls enable this. thx.
@jim3ma are there any plans to introduce the changes into the Go code?
@jim3ma are there any plans to introduce the changes into the Go code?
Sorry for busy work. I will rebase kTLS code in latest branch and test it again.
any updates?
@jim3ma curious about the updates too
been checking here https://github.com/0-haha/gnet-tls-go1-20/ and ref: https://github.com/panjf2000/gnet/issues/534
@FiloSottile i've been watching ktls progress for golang since you started the blog in 2021. this is sort of the final huge golang performance benchmark penalty ever.
once this is ktls-ed, i believe will be one of golang's greatest milestone ever.
I did some rough benchmarks late last year where I had Golang call into rust's TLS library via CGO to do the handshake and then handed off the established TCP connection to Golang.
I found that the performance (throughput/latency on sustained traffic) ended up being about the same as golang's built-in TLS or slightly worse.
I'm not sure why to be honest - maybe I did something wrong? But I would like to see some numbers hopefully from someone else on the actual performance of the kTLS implementation in the linux kernel.
@ShivanshVij u hv the code for helping to debug? but ktls is better for sure.
I did some rough benchmarks late last year where I had Golang call into rust's TLS library via CGO to do the handshake and then handed off the established TCP connection to Golang.
Based on my understanding, kTLS does not magically works, it's used for zero copy, so you have to send a fd through syscall
Yep - so the implementation was really straight forward.
Start a TCP listener, wait for a TCP connection to get accepted, and read some N bytes from it and send them to rustls
via CGO. If we needed more bytes the rustls
library would signal that, otherwise it would give us some bytes to write back to the connection - which we would do in Go by blindly writing the byte slice into the net.Conn
.
Once the handshake was complete, we'd pull out the required kTLS secrets from the handshake in rustls
, and then do the required syscalls in Go to tell the kernel that the fd
that backed the net.Conn
is a kTLS fd
.
After that, future reads/writes on the net.Conn
would result in proper TLS encryption/decryption without any userspace overhead.
One more thing - many better network card have crypto-acceleration and this can be accessed by ktls API, so supporting ktls in golang we are able to offload encryption to network card so please don't compare only software encryption in golang vs software encryption in kernel - it's not so relevant for many production environments
i've used gnet's ktls and other ktls version but i found that if going through cgo, and with multiple goroutines, it seems to crash. e.g. 1000000 goroutines calling cgo seemed not possible. u can probably do 80k max. so not sure if ktls will be available to do so or if this will be an issue crashing if doing cgo syscall etc.
This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group
It looks like rustls (for Rust) makes kTLS possible by allowing access to the key material after the handshake completes. Could that be the right stance for Go as well, to allow use of kTLS without the crypto/tls maintainers needing to take on ownership of all of the moving parts?
The QUIC support in crypto/tls is in a similar position, where crypto/tls does the initial handshake and then hands the key material over to its caller.
From what I can tell, the discussion at https://github.com/rustls/rustls/issues/198 led to https://docs.rs/rustls/latest/rustls/struct.ExtractedSecrets.html, which in turn enables users to provide their own kTLS wiring. (There's support in crypto/tls already for a Config.KeyLogWriter — but as the rustls maintainers also discovered, that format doesn't include all of the information that the kernel needs to continue the symmetric encryption.)
We aim to achieve 400Gbit/s network throughput serving HTTP with Go but are currently constrained by memory bandwidth. With AMD Rome generation, we can reach 165Gbit/s of network traffic, with memory bandwidth fully utilized, as shown by AMDuProf. To overcome this, we need zero-copy techniques like sendfile, which requires kTLS support in Go, eliminating the memory bandwidth constraint.
@rolandshoemaker and @FiloSottile to work out an API. It sounds like we should work on an API where Go keeps the handshake and then hands off the key so the kernel can do the record layer.
@FiloSottile and I discussed this, and we wonder if this can be done without any new secret-sharing API at all: if kTLS is good enough, then Go should arrange to use it by default, right? We'd probably also need to add ReadFrom and WriteTo methods to the tls.Conn implementations so that io.Copy goes straight to sendfile, but no new TLS-related API would be needed.
Is there a flaw in this thinking?
Are there Go or Rust kTLS implementations already that are worth looking at to understand the kernel interaction details? We spent a while reading linux/tls.h but it's not terribly well documented.
And are there other operating systems with kTLS that we should look at?
I believe it would the be the right thing to get kTLS going as a default on supported systems. Having a secret sharing API might be useful for some developers though, maybe something one can meddle with explicitly.
Maybe this helps with the kernel interaction.
Looking at FreeBSD would probably be a good idea. The implementation seems quite mature.
Nginx has had support since around 2021. Although I think it just delegates the hard work to OpenSSL. Still might be worth a look here; https://hg.nginx.org/nginx/rev/65946a191197
I believe it would the be the right thing to get kTLS going as a default on supported systems. Having a secret sharing API might be useful for some developers though, maybe something one can meddle with explicitly.
Meddlers can always use reflect and unsafe. No need to add API for them.
here, some of the unverified and broken ktls on my radar: https://github.com/0-haha/gnet-tls-go1-20/blob/dev/ktls_linux.go https://github.com/soluble-ai/go-ktls/blob/master/ktls.go
when's the eta for this? been looking at this thread since 2021. :D
@totallyunknown 's doing 165GBits/s on a 400Gbits/s line is really weak. i'm hoping for the performance too.
@rsc possible for meddlers to live with one without the alloc/op too? that'll be heaven.
talk about zero alloc/op... i really wish arena feature is fully supported as non-experimental.