inko icon indicating copy to clipboard operation
inko copied to clipboard

Support for TLS sockets

Open yorickpeterse opened this issue 2 years ago • 3 comments

Inko's sockets are just regular sockets, without any built-in support for TLS (as in SSL/TLS, not thread-local storage). Combine that with the lack of built-in support for various ciphers/cryptographic functions, and using encrypted connections is a real challenge.

To make this possible, Inko should provide a TLS stack of some sort which can be used with its non-blocking sockets. As first sight the best option may appear to be using rustls and exposing it through the VM. While this would likely save us time, I'm not a fan of it, for a few reasons:

  • Our API would largely be dictated by the API exposed by rustls, and the resulting API may be annoying to use
  • We'd increase the number of VM/non-Inko dependencies, and I want to keep those to a minimum
  • A TLS stack requires various features/functions/etc we'd likely have to support anyway, such as cryptographic functions. Instead of exposing all that through the VM, I want to write as much in Inko as possible
  • We'd likely have to wrap various Rust types in raw pointers and heap allocate them. I'd like to keep this to a minimum, only using it where absolutely necessary

A first step towards this it to identify what exactly we'd need. After that we need to figure out if there's some sort of shared test suite/reference we can use to determine if we're implementing things correctly. The last thing I want to do is implement this wrong.

Related issues:

  • https://github.com/inko-lang/inko/issues/574
  • https://github.com/inko-lang/inko/issues/573
  • https://github.com/inko-lang/inko/issues/572

yorickpeterse avatar Sep 11 '22 20:09 yorickpeterse

marked this issue as related to #283

yorickpeterse avatar Sep 26 '22 19:09 yorickpeterse

While the original desire was a pure Inko TLS stack, I'm currently leaning more towards just using OpenSSL 3.x instead. For one, writing such a stack is a massive effort. Second, it will probably be a long time before we can guarantee constant-time operations needed for various crypto bits, potentially opening us up for timing attacks until then.

Using OpenSSL does mean that cross-compilation becomes more of a challenge, whereas a pure Inko stack wouldn't suffer from this.

Of course wrapping rustls is still a promising option, and would probably make cross-compilation a heck of a lot easier.

yorickpeterse avatar Jan 13 '24 14:01 yorickpeterse

Based on https://github.com/rustls/rustls/blob/main/examples/src/bin/simpleclient.rs, I think wrapping rustls shouldn't be too hard. Essentially TLS sockets would need to store three values: a file descriptor to the raw socket, the rustls wrapper reader/writer, and a reference to the rustls connection object.

yorickpeterse avatar Jan 13 '24 14:01 yorickpeterse

https://github.com/fres621/inko-tls/ might prove useful as a reference.

yorickpeterse avatar Jun 17 '24 14:06 yorickpeterse

Per the source code of rustls' Stream type (https://docs.rs/rustls/latest/src/rustls/stream.rs.html#11-17), it seems we can just create a Stream ad-hoc for every TLS read/write, instead of having to keep it around. This means that for a TLS socket we only need to persist a ClientConnection and ClientConfig somewhere.

yorickpeterse avatar Jul 04 '24 03:07 yorickpeterse

Writing an entire TLS stack from scratch is going to be way too much work, and probably isn't a good idea to begin with as Inko has nothing to allow constant-time operations (as needed for correct encryption). As such, we'll start with basing TLS sockets on either OpenSSL or rustls, whichever works best.

yorickpeterse avatar Jul 04 '24 14:07 yorickpeterse

The "tls" branch introduces TLS support using rustls. This works, though the rustls documentation is lacking/confusing in various places, and its examples are super dense/too noisy to be of much use.

Using rustls introduces a few problems:

  • With the aws-lc backend, compile times are increased by about 30 seconds. With the ring backend the increase is only 5-10 seconds, but TLS 1.3 performance will suffer.
  • The aws-lc backend doesn't provide pre-generated bindings for FreeBSD, requiring cmake and Perl to be installed in other to compile aws-lc. I don't want this, so we'd have to use the ring backend (which works fine on FreeBSD)
  • Using rustls increases binary sizes by about 6 MiB before stripping and just under 2 MiB after stripping, due to all the extra TLS code. This size increase is present even if std.net.tls is never imported, because it's part of the runtime library

Because of this I played around with using OpenSSL through Inko's FFI. This sketch looks like this:

openssl.inko
import extern "crypto"
import extern "ssl"

import std.fmt (fmt)
import std.net.ip (IpAddress)
import std.net.socket (Socket, TcpClient)
import std.process (sleep)
import std.stdio (STDOUT)
import std.time (Duration)

let TLS1_2_VERSION = 0x0303
let SSL_CTRL_SET_MIN_PROTO_VERSION = 123
let SSL_ERROR_SSL = 1
let SSL_ERROR_WANT_READ = 2
let SSL_ERROR_WANT_WRITE = 3

fn extern TLS_method -> Pointer[UInt8]

fn extern SSL_CTX_new(method: Pointer[UInt8]) -> Pointer[UInt8]

fn extern SSL_CTX_ctrl(
  context: Pointer[UInt8],
  cmd: Int32,
  larg: Int,
  parg: Pointer[UInt8],
) -> Int

fn extern SSL_new(context: Pointer[UInt8]) -> Pointer[UInt8]

fn extern SSL_free(ssl: Pointer[UInt8])

fn extern SSL_CTX_free(context: Pointer[UInt8])

fn extern SSL_set_fd(ssl: Pointer[UInt8], fd: Int32) -> Int32

fn extern SSL_connect(ssl: Pointer[UInt8]) -> Int32

fn extern SSL_get_error(ssl: Pointer[UInt8], code: Int32) -> Int32

fn extern SSL_write_ex(
  ssl: Pointer[UInt8],
  buf: Pointer[UInt8],
  size: UInt64,
  written: Pointer[UInt64],
) -> Int32

fn extern ERR_get_error -> UInt64

fn extern ERR_reason_error_string(code: UInt64) -> Pointer[UInt8]

fn extern ERR_clear_error

class async Main {
  fn async main {
    let stdout = STDOUT.new
    let client = TcpClient
      .new(ip: IpAddress.v4(127, 0, 0, 1), port: 9000)
      .or_panic('failed to connect')

    let ctx = SSL_CTX_new(TLS_method)

    if ctx as Int == 0 { panic('SSL_CTX_new() failed') }

    # Set the minimum version to TLS 1.2, as everything else is deprecated and
    # insecure.
    if
      SSL_CTX_ctrl(
        ctx,
        SSL_CTRL_SET_MIN_PROTO_VERSION as Int32,
        TLS1_2_VERSION,
        0x0 as Pointer[UInt8],
      )
        as Int
        == 0
    {
      panic('failed to set the minimum TLS version')
    }

    let ssl = SSL_new(ctx)

    if ssl as Int == 0 { panic('SSL_new() failed') }

    if SSL_set_fd(ssl, client.socket.socket.inner) as Int == 0 {
      panic('SSL_set_fd() failed')
    }

    loop {
      ERR_clear_error

      match SSL_connect(ssl) as Int {
        case 1 -> {
          stdout.print('connected')
          break
        }
        case 0 -> panic('handshake failed due to shutdown')
        case n -> {
          match SSL_get_error(ssl, n as Int32) as Int {
            case SSL_ERROR_WANT_READ -> {
              stdout.print('polling for read')
              client.socket.poll(write: false)
              stdout.print('socket now readable')
            }
            case SSL_ERROR_WANT_WRITE -> panic('want write, what?')
            case other -> panic('other error: ${other}')
          }
        }
      }
    }

    let buf = 'ping'.to_byte_array
    let written = 0 as UInt64

    loop {
      ERR_clear_error

      match
        SSL_write_ex(
          ssl,
          buf.to_pointer,
          size: buf.size as UInt64,
          written: mut written,
        )
          as Int
      {
        case 1 -> {
          stdout.print('write done')
          break
        }
        case n -> {
          match SSL_get_error(ssl, n as Int32) as Int {
            case SSL_ERROR_WANT_READ -> panic('want read')
            case SSL_ERROR_WANT_WRITE -> panic('want write')
            case SSL_ERROR_SSL -> {
              # TODO: this uses thread-local state. Yuck!
              #
              # Basically it seems that error handling with OpenSSL is a proper
              # mess, heavily relying on thread-local state. Not only does
              # managing that state add overhead, it's also fragile considering
              # Inko's scheduler may reschedule a program in between calls (at
              # least at some point in the future).
              #
              # Based on searching around the internet, it seems many users run
              # into issues with this. Crystal for example doesn't even do this
              # and there's an issue open for that since 2019.
              let err = ERR_get_error
              let str = String.from_pointer(ERR_reason_error_string(err))

              panic('fatal SSL error, reason: ${str}')
            }
            case other -> panic('other error: ${other}')
          }
        }
      }
    }

    sleep(Duration.from_secs(1))
    SSL_free(ssl)
    SSL_CTX_free(ctx)
  }
}

OpenSSL however brings its own problems:

  • While dynamically linking it reduces the binary size, it also makes cross-compilation more difficult. I really don't want to start providing pre-compiled static libraries of OpenSSL for different platforms, as that is A) more work for us B) less secure compared to using a package from a trusted source
  • OpenSSL's error handling mechanism relies on thread-local state, and from searching around the only sensible way of dealing with this is to clear the errors before every OpenSSL call. This, combined with Inko's scheduler, makes using OpenSSL very fragile. Even without the scheduler, the thread-local error queue mechanism is horrible to work with
  • If I'm not mistaken, OpenSSL by default also enables insecure algorithms (e.g. TLS 1.1 and older), requiring you to explicitly set a minimum version. This is doable, but I worry there are other parts where it's making the wrong/outdated decisions. That is, using OpenSSL feels like walking through a minefield

My current thought is that we can do the following:

  • Polling sockets is exposed through an external runtime function (the above sketch already relies/makes use of that)
  • All TLS code is moved into a separate crate that basically acts as a C wrapper around rustls. We then handle WouldBlock errors in the standard library
  • If std.net.tls is imported, we have to somehow inject the static library produced by this crate. What we could do is name this e.g. libinkotls and when we detect import extern "inkotls" we ensure we always import the static library, instead of searching the system libraries

Using this approach we should be able to offer a safe TLS setup, without messing around with OpenSSL, and without always increasing the executable sizes.

With that said, I'm not sure a 6 MiB increase is really worth the trouble, so I'll need to think about this for a bit.

yorickpeterse avatar Jul 18 '24 16:07 yorickpeterse

Dumping some OpenSSL related resources:

  • https://github.com/openssl/openssl/issues/3033
  • https://codereview.stackexchange.com/questions/108600/complete-async-openssl-example
  • https://stackoverflow.com/questions/18179128/how-to-manage-the-error-queue-in-openssl-ssl-get-error-and-err-get-error
  • https://stackoverflow.com/questions/37980798/openssl-error-handling
  • https://github.com/openssl/openssl/discussions/23025
  • https://github.com/openssl/openssl/issues/23451
  • https://hynek.me/articles/apple-openssl-verification-surprises/

yorickpeterse avatar Jul 18 '24 16:07 yorickpeterse

Regarding splitting libraries, I think that might actually make things worse: Rust doesn't have a stable ABI, so the separate TLS crate producing libinkotls.a would have to include the standard library as well. The result is that we end up linking with two .a libraries both including the exact same standard library code, resulting in an executable that's actually larger than when using a single library.

yorickpeterse avatar Jul 18 '24 16:07 yorickpeterse

It seems Rustls is doing some blocking IO under the hoods, which could mess up the scheduler/performance:

  • https://github.com/rustls/rustls/issues/850
    • https://github.com/dart-lang/sdk/issues/41519#issuecomment-618646396
  • https://github.com/rustls/rustls-native-certs/issues/25#issuecomment-943358491
  • https://github.com/golang/go/issues/46287

edit: to clarify, Rustls itself doesn't perform the blocking IO, but the underlying OS-specific routines used to verify certificates might.

yorickpeterse avatar Jul 18 '24 23:07 yorickpeterse

To add to the above: this affects macOS and Windows, as both Linux and FreeBSD have a rather straightforward system that doesn't involve expensive IO calls under the hoods.

How a few languages handle this differs (here "OpenSSL" as the technique means it uses/mimics OpenSSL behaviour):

Language Technique
Node.js :red_circle: OpenSSL
Go :green_circle: System
Ruby :red_circle: OpenSSL (as far as I can tell)
Python :red_circle: OpenSSL, with a manual workaround
Deno :green_circle: System, using this crate and this crate
Java :yellow_circle: System, provided this is explicitly enabled, not sure about the default

Based on this (and assuming this pattern holds for other languages), it seems a little inconsistent as to what approach is used.

yorickpeterse avatar Jul 19 '24 13:07 yorickpeterse

The choices come down to the following:

  • rustls-platform-verifier: does what users might expect on macOS and Windows, but potentially slow down certificate validation depending on the underlying IO work done. This in turn can degrade application performance (example)
  • rustls-native-certs able to load the certificates from the system root, but does its own validation. This should result in consistent performance, but may result in surprising validation results (i.e. a certificate still being treated as valid when it has in fact been revoked, requiring a restart)

I'm not sure yet which option would be better.

yorickpeterse avatar Jul 19 '24 13:07 yorickpeterse

Per https://github.com/rustls/rustls-native-certs/issues/30, it seems that if we use rustls-native-certs then loading the certificates (which happens when creating a client config) could potentially take a long time.

yorickpeterse avatar Jul 19 '24 14:07 yorickpeterse

I'm inclined to just stick with rustls-platform-verifier, as the (potential) performance implications only apply on macOS, and even then it's not clear how likely they are to happen.

What we could do is fork the project and modify the code that does the verification such that it's wrapped in a Process::blocking call. This way any delays introduced by it won't hog an entire OS thread. The challenge here is that we have to somehow inject the current process into that code, preferably without the use of thread-locals.

yorickpeterse avatar Jul 19 '24 16:07 yorickpeterse

I ended up with vendoring and patching rustls-platform-verifier, such that it handles potentially blocking calls on macOS. It's not great that we have to vendor and patch things, but it seems to be the only working approach at this stage. Some additional details are found in this comment.

yorickpeterse avatar Jul 19 '24 18:07 yorickpeterse