inko
inko copied to clipboard
Support for TLS sockets
Inko's sockets are just regular sockets, without any built-in support for TLS (as in SSL/TLS, not thread-local storage). Combine that with the lack of built-in support for various ciphers/cryptographic functions, and using encrypted connections is a real challenge.
To make this possible, Inko should provide a TLS stack of some sort which can be used with its non-blocking sockets. As first sight the best option may appear to be using rustls and exposing it through the VM. While this would likely save us time, I'm not a fan of it, for a few reasons:
- Our API would largely be dictated by the API exposed by rustls, and the resulting API may be annoying to use
- We'd increase the number of VM/non-Inko dependencies, and I want to keep those to a minimum
- A TLS stack requires various features/functions/etc we'd likely have to support anyway, such as cryptographic functions. Instead of exposing all that through the VM, I want to write as much in Inko as possible
- We'd likely have to wrap various Rust types in raw pointers and heap allocate them. I'd like to keep this to a minimum, only using it where absolutely necessary
A first step towards this it to identify what exactly we'd need. After that we need to figure out if there's some sort of shared test suite/reference we can use to determine if we're implementing things correctly. The last thing I want to do is implement this wrong.
Related issues:
- https://github.com/inko-lang/inko/issues/574
- https://github.com/inko-lang/inko/issues/573
- https://github.com/inko-lang/inko/issues/572
marked this issue as related to #283
While the original desire was a pure Inko TLS stack, I'm currently leaning more towards just using OpenSSL 3.x instead. For one, writing such a stack is a massive effort. Second, it will probably be a long time before we can guarantee constant-time operations needed for various crypto bits, potentially opening us up for timing attacks until then.
Using OpenSSL does mean that cross-compilation becomes more of a challenge, whereas a pure Inko stack wouldn't suffer from this.
Of course wrapping rustls is still a promising option, and would probably make cross-compilation a heck of a lot easier.
Based on https://github.com/rustls/rustls/blob/main/examples/src/bin/simpleclient.rs, I think wrapping rustls shouldn't be too hard. Essentially TLS sockets would need to store three values: a file descriptor to the raw socket, the rustls wrapper reader/writer, and a reference to the rustls connection object.
https://github.com/fres621/inko-tls/ might prove useful as a reference.
Per the source code of rustls' Stream
type (https://docs.rs/rustls/latest/src/rustls/stream.rs.html#11-17), it seems we can just create a Stream
ad-hoc for every TLS read/write, instead of having to keep it around. This means that for a TLS socket we only need to persist a ClientConnection
and ClientConfig
somewhere.
Writing an entire TLS stack from scratch is going to be way too much work, and probably isn't a good idea to begin with as Inko has nothing to allow constant-time operations (as needed for correct encryption). As such, we'll start with basing TLS sockets on either OpenSSL or rustls, whichever works best.
The "tls" branch introduces TLS support using rustls. This works, though the rustls documentation is lacking/confusing in various places, and its examples are super dense/too noisy to be of much use.
Using rustls introduces a few problems:
- With the aws-lc backend, compile times are increased by about 30 seconds. With the ring backend the increase is only 5-10 seconds, but TLS 1.3 performance will suffer.
- The aws-lc backend doesn't provide pre-generated bindings for FreeBSD, requiring cmake and Perl to be installed in other to compile aws-lc. I don't want this, so we'd have to use the ring backend (which works fine on FreeBSD)
- Using rustls increases binary sizes by about 6 MiB before stripping and just under 2 MiB after stripping, due to all the extra TLS code. This size increase is present even if
std.net.tls
is never imported, because it's part of the runtime library
Because of this I played around with using OpenSSL through Inko's FFI. This sketch looks like this:
openssl.inko
import extern "crypto"
import extern "ssl"
import std.fmt (fmt)
import std.net.ip (IpAddress)
import std.net.socket (Socket, TcpClient)
import std.process (sleep)
import std.stdio (STDOUT)
import std.time (Duration)
let TLS1_2_VERSION = 0x0303
let SSL_CTRL_SET_MIN_PROTO_VERSION = 123
let SSL_ERROR_SSL = 1
let SSL_ERROR_WANT_READ = 2
let SSL_ERROR_WANT_WRITE = 3
fn extern TLS_method -> Pointer[UInt8]
fn extern SSL_CTX_new(method: Pointer[UInt8]) -> Pointer[UInt8]
fn extern SSL_CTX_ctrl(
context: Pointer[UInt8],
cmd: Int32,
larg: Int,
parg: Pointer[UInt8],
) -> Int
fn extern SSL_new(context: Pointer[UInt8]) -> Pointer[UInt8]
fn extern SSL_free(ssl: Pointer[UInt8])
fn extern SSL_CTX_free(context: Pointer[UInt8])
fn extern SSL_set_fd(ssl: Pointer[UInt8], fd: Int32) -> Int32
fn extern SSL_connect(ssl: Pointer[UInt8]) -> Int32
fn extern SSL_get_error(ssl: Pointer[UInt8], code: Int32) -> Int32
fn extern SSL_write_ex(
ssl: Pointer[UInt8],
buf: Pointer[UInt8],
size: UInt64,
written: Pointer[UInt64],
) -> Int32
fn extern ERR_get_error -> UInt64
fn extern ERR_reason_error_string(code: UInt64) -> Pointer[UInt8]
fn extern ERR_clear_error
class async Main {
fn async main {
let stdout = STDOUT.new
let client = TcpClient
.new(ip: IpAddress.v4(127, 0, 0, 1), port: 9000)
.or_panic('failed to connect')
let ctx = SSL_CTX_new(TLS_method)
if ctx as Int == 0 { panic('SSL_CTX_new() failed') }
# Set the minimum version to TLS 1.2, as everything else is deprecated and
# insecure.
if
SSL_CTX_ctrl(
ctx,
SSL_CTRL_SET_MIN_PROTO_VERSION as Int32,
TLS1_2_VERSION,
0x0 as Pointer[UInt8],
)
as Int
== 0
{
panic('failed to set the minimum TLS version')
}
let ssl = SSL_new(ctx)
if ssl as Int == 0 { panic('SSL_new() failed') }
if SSL_set_fd(ssl, client.socket.socket.inner) as Int == 0 {
panic('SSL_set_fd() failed')
}
loop {
ERR_clear_error
match SSL_connect(ssl) as Int {
case 1 -> {
stdout.print('connected')
break
}
case 0 -> panic('handshake failed due to shutdown')
case n -> {
match SSL_get_error(ssl, n as Int32) as Int {
case SSL_ERROR_WANT_READ -> {
stdout.print('polling for read')
client.socket.poll(write: false)
stdout.print('socket now readable')
}
case SSL_ERROR_WANT_WRITE -> panic('want write, what?')
case other -> panic('other error: ${other}')
}
}
}
}
let buf = 'ping'.to_byte_array
let written = 0 as UInt64
loop {
ERR_clear_error
match
SSL_write_ex(
ssl,
buf.to_pointer,
size: buf.size as UInt64,
written: mut written,
)
as Int
{
case 1 -> {
stdout.print('write done')
break
}
case n -> {
match SSL_get_error(ssl, n as Int32) as Int {
case SSL_ERROR_WANT_READ -> panic('want read')
case SSL_ERROR_WANT_WRITE -> panic('want write')
case SSL_ERROR_SSL -> {
# TODO: this uses thread-local state. Yuck!
#
# Basically it seems that error handling with OpenSSL is a proper
# mess, heavily relying on thread-local state. Not only does
# managing that state add overhead, it's also fragile considering
# Inko's scheduler may reschedule a program in between calls (at
# least at some point in the future).
#
# Based on searching around the internet, it seems many users run
# into issues with this. Crystal for example doesn't even do this
# and there's an issue open for that since 2019.
let err = ERR_get_error
let str = String.from_pointer(ERR_reason_error_string(err))
panic('fatal SSL error, reason: ${str}')
}
case other -> panic('other error: ${other}')
}
}
}
}
sleep(Duration.from_secs(1))
SSL_free(ssl)
SSL_CTX_free(ctx)
}
}
OpenSSL however brings its own problems:
- While dynamically linking it reduces the binary size, it also makes cross-compilation more difficult. I really don't want to start providing pre-compiled static libraries of OpenSSL for different platforms, as that is A) more work for us B) less secure compared to using a package from a trusted source
- OpenSSL's error handling mechanism relies on thread-local state, and from searching around the only sensible way of dealing with this is to clear the errors before every OpenSSL call. This, combined with Inko's scheduler, makes using OpenSSL very fragile. Even without the scheduler, the thread-local error queue mechanism is horrible to work with
- If I'm not mistaken, OpenSSL by default also enables insecure algorithms (e.g. TLS 1.1 and older), requiring you to explicitly set a minimum version. This is doable, but I worry there are other parts where it's making the wrong/outdated decisions. That is, using OpenSSL feels like walking through a minefield
My current thought is that we can do the following:
- Polling sockets is exposed through an external runtime function (the above sketch already relies/makes use of that)
- All TLS code is moved into a separate crate that basically acts as a C wrapper around rustls. We then handle
WouldBlock
errors in the standard library - If
std.net.tls
is imported, we have to somehow inject the static library produced by this crate. What we could do is name this e.g.libinkotls
and when we detectimport extern "inkotls"
we ensure we always import the static library, instead of searching the system libraries
Using this approach we should be able to offer a safe TLS setup, without messing around with OpenSSL, and without always increasing the executable sizes.
With that said, I'm not sure a 6 MiB increase is really worth the trouble, so I'll need to think about this for a bit.
Dumping some OpenSSL related resources:
- https://github.com/openssl/openssl/issues/3033
- https://codereview.stackexchange.com/questions/108600/complete-async-openssl-example
- https://stackoverflow.com/questions/18179128/how-to-manage-the-error-queue-in-openssl-ssl-get-error-and-err-get-error
- https://stackoverflow.com/questions/37980798/openssl-error-handling
- https://github.com/openssl/openssl/discussions/23025
- https://github.com/openssl/openssl/issues/23451
- https://hynek.me/articles/apple-openssl-verification-surprises/
Regarding splitting libraries, I think that might actually make things worse: Rust doesn't have a stable ABI, so the separate TLS crate producing libinkotls.a
would have to include the standard library as well. The result is that we end up linking with two .a
libraries both including the exact same standard library code, resulting in an executable that's actually larger than when using a single library.
It seems Rustls is doing some blocking IO under the hoods, which could mess up the scheduler/performance:
- https://github.com/rustls/rustls/issues/850
- https://github.com/dart-lang/sdk/issues/41519#issuecomment-618646396
- https://github.com/rustls/rustls-native-certs/issues/25#issuecomment-943358491
- https://github.com/golang/go/issues/46287
edit: to clarify, Rustls itself doesn't perform the blocking IO, but the underlying OS-specific routines used to verify certificates might.
To add to the above: this affects macOS and Windows, as both Linux and FreeBSD have a rather straightforward system that doesn't involve expensive IO calls under the hoods.
How a few languages handle this differs (here "OpenSSL" as the technique means it uses/mimics OpenSSL behaviour):
Language | Technique |
---|---|
Node.js | :red_circle: OpenSSL |
Go | :green_circle: System |
Ruby | :red_circle: OpenSSL (as far as I can tell) |
Python | :red_circle: OpenSSL, with a manual workaround |
Deno | :green_circle: System, using this crate and this crate |
Java | :yellow_circle: System, provided this is explicitly enabled, not sure about the default |
Based on this (and assuming this pattern holds for other languages), it seems a little inconsistent as to what approach is used.
The choices come down to the following:
- rustls-platform-verifier: does what users might expect on macOS and Windows, but potentially slow down certificate validation depending on the underlying IO work done. This in turn can degrade application performance (example)
- rustls-native-certs able to load the certificates from the system root, but does its own validation. This should result in consistent performance, but may result in surprising validation results (i.e. a certificate still being treated as valid when it has in fact been revoked, requiring a restart)
I'm not sure yet which option would be better.
Per https://github.com/rustls/rustls-native-certs/issues/30, it seems that if we use rustls-native-certs then loading the certificates (which happens when creating a client config) could potentially take a long time.
I'm inclined to just stick with rustls-platform-verifier, as the (potential) performance implications only apply on macOS, and even then it's not clear how likely they are to happen.
What we could do is fork the project and modify the code that does the verification such that it's wrapped in a Process::blocking
call. This way any delays introduced by it won't hog an entire OS thread. The challenge here is that we have to somehow inject the current process into that code, preferably without the use of thread-locals.
I ended up with vendoring and patching rustls-platform-verifier, such that it handles potentially blocking calls on macOS. It's not great that we have to vendor and patch things, but it seems to be the only working approach at this stage. Some additional details are found in this comment.