plrust icon indicating copy to clipboard operation
plrust copied to clipboard

RFC: Trusted Language Handling

Open Hoverbear opened this issue 2 years ago • 6 comments

This is a request for comments, we'd like to hear user (or potential user) opinions and get feedback!

As highlighted by @jim-mlodgenski in #9, PL/Rust is currently an "Untrusted" language. This is mostly due to immaturity of the project.

It's our intent to provide a "Trusted" language handler with this repo. For a review of the distinction between trusted and untrusted, see the PostgreSQL PL/Perl docs.

PL/Rust should be restricted from using certain operations such as:

  • accessing file handles,
  • accessing database internals,
  • getting OS-level access with the permissions of a server process (as a C function could)

Rust is a bit different

Rust is considerably different than Perl, Python, Node (via PL/v8) in that it is not interpreted, nor does it have a runtime we can embed in PL/Rust.

Rust creates either target-specific shared objects or binaries, or it creates portable wasm artifacts.

The process of building the code can potentially mutate the build environment (via build.rs or proc macros), but this is a topic for a separate issue.

The shared objects (or binaries) are roughly equivalent to C code, and get executed as such. Due to it's nature, Rust also naturally includes abundant unsafe support, unlike these other interpreted languages. A Rust function from a shared library can do as it pleases with the memory of the process, for example.

PL/Rust's current handling

At this time, PL/Rust is loaded as a PostgreSQL extension, and inside of that process it finds and dyloads the produced cdylib.

Doing this, library initialization routines occur, and all normal precautions around calling unknown foreign functions must be followed. In general, this process is not trusted.

Since PL/Rust created libraries are ultimately loaded by the postmaster binary, they can do whatever any other pgx extension can do, or any C function.

Ways forward

Ultimately, there are two paths we've been discussing:

  • Having shared objects run in a separate (probably per-function) process which is sandboxed via seccomp/cgroups, and forcing #[forbid(unsafe)] and disabling std, ensuring users only use core and alloc, along with other allowlisted also #[no_std] crates (so the user cannot simply extern crate std;.
  • Having PL/Rust embed wasmtime (or some other runtime) and use wasm's sandboxing features from inside the postmaster process. (Note this path does not require any disallowing of unsafe code, as the function will already be sandboxed, so it's free to do whatever it wants to it's memory.)

There are several factors contributing to a decision:

  • Speed: We want PL/Rust to provide users with nearly the same speed as an untrusted C function. It's expected that there will be some cost to sandboxing, but we'd like to avoid impacting the already rather high cost of doing an extern call in PostgreSQL.
  • Maintainability: Our team is small, and we simply do not have the resources to invest managing a completely homegrown sandboxing solution. It would be desirable if we could use an existing solution and work with that upstream to improve the state of the art when we do have time.
  • Risk: If a user was able to create untrusted behavior in PL/Rust, we would like to limit the blast radius. It's preferable if the behavior gets killed, even if that takes down the entire database (but safely), as opposed to allow some untrusted behavior. Of course, we'd like that blast radius to be smaller, but a failure like that is still preferable to a PL/Rust function getting access to the superuser account, or a shell onto the host machine.

Hoverbear avatar Mar 23 '22 16:03 Hoverbear

I have a personal bias towards the wasm route because I can certainly see in the future users requesting things like "I'd like PL/Rust to be able to access XYZ directory and I will configure that with a GUC".

We noted from some benchmarks and papers that there is some definite speed impact, which I worry about, however creating proper isolation on cdylibs may incur roughly the same overhead...

Hoverbear avatar Mar 23 '22 16:03 Hoverbear

I was investigating the dylib method a bit this morning.

There exists crates like https://github.com/servo/gaol (created for Servo, but long unmaintained), https://github.com/unrelentingtech/rusty-sandbox (also long unmaintained).

I also found this really good article by @marioortizmanero who is(/was?) doing a @tremor-rs LFX internship: https://nullderef.com/blog/plugin-tech/, which also linked to this paper which includes some metrics around inter-process communication options other than WASM.

image

I also saw https://github.com/quantumsheep/godbox, which uses https://github.com/ioi/isolate underneath. This offers some Linux-specific solutions. Godbox doesn't seem appropriate for our needs, but Isolate might be something we have interests in.

I also found this paper: https://dl.acm.org/doi/pdf/10.1145/3477113.3487272 which discusses some possibilities on using actual language features to create isolation boundaries, but it mostly describes and proposes mechanisms, and doesn't seem to have existing implementation.

Hoverbear avatar Mar 24 '22 16:03 Hoverbear

We've dug into a branch that does execution with wasmtime and it turns out to be quite fast: https://github.com/tcdi/plrust/tree/wasm-execution

We're exploring how to manage interfaces like SPI next.

(Debug mode)

plrust=# DROP EXTENSION IF EXISTS plrust CASCADE; CREATE EXTENSION plrust;
NOTICE:  drop cascades to function spi_poc2(text)
DROP EXTENSION
Time: 3.273 ms
CREATE EXTENSION
Time: 3.196 ms
plrust=# CREATE OR REPLACE FUNCTION spi_poc2(a TEXT) RETURNS TEXT IMMUTABLE STRICT LANGUAGE PLRUST AS $$ a.to_string() $$;
INFO:  fn25168_2200_235880
CREATE FUNCTION
Time: 4242.268 ms (00:04.242)
plrust=# SELECT spi_poc2('bean');
 spi_poc2 
----------
 bean
(1 row)

Time: 285.885 ms
plrust=# SELECT spi_poc2('bean');
 spi_poc2 
----------
 bean
(1 row)

Time: 1.460 ms
plrust=# SELECT spi_poc2('bean');
 spi_poc2 
----------
 bean
(1 row)

Time: 1.391 ms
plrust=# 

Hoverbear avatar Apr 04 '22 21:04 Hoverbear

Tasks

  • [x] Investigate WASM
    • [x] https://github.com/tcdi/plrust/issues/35
    • [x] https://github.com/tcdi/plrust/issues/34
  • [ ] Not allow the Rust “unsafe” keyword
  • [ ] Force “pl/rust” functions to be “no_std”, but allow “alloc”
  • [ ] https://github.com/tcdi/plrust/issues/26
  • [ ] https://github.com/tcdi/plrust/issues/32

johnrballard avatar Aug 18 '22 19:08 johnrballard

In general preventing unsafe is insufficient to prevent... anything. The long list of I-unsound functions in rust-lang/rust is full of evidence of this, and most of these are not considered a security issue, as the language doesn't guarantee that you can't pwn yourself. In particular, there are pretty much always a few[^few] lifetime or typesystem bugs which can be abused to easily cause a use-after-free with no unsafe code. These (unlike most I-unsound issues) can be used to subvert the safety system in basically any manner desired, including read+write access to any memory (trivially), transmutes (easily), calling any function in the process (more difficult, but usually possible), and so on.

[^few]: At least 6 or 7 of these seem to exist on stable Rust when I looked last night. Most of them are over a year old, some much older, and they're generally very hard to detect/prevent.

I'd think spawning a sandboxed process (seccomp filter on Linux, seatbelt on macOS, ...) to run the function would be able to limit the damage to what that process has access to (if we can arrange that to be less, that would be ideal). I'm not terribly familiar with sandboxing on Linux though, and maybe either seccomp isn't strong enough[^1], or it may be there's no way to limit the damage, since they still need the ability to talk to the db. Either way, I'm assuming this isn't quite enough given the mention that we'd still need to prevent unsafe in the child.

FWIW, WebAssembly (via wasmtime or similar) is much better suited for this purpose -- with a Wasm sandbox, our risk would be limited to spectre-style attacks, which every other trusted language doubtlessly has trouble with as well[^2]. These are only powerful enough to grant read access to other process memory. We don't need browser-level sandboxing so I believe this would not be of concern.

[^1]: Certainly macOS's seatbelt sandboxing seems like it could do this (it's extremely powerful, IIUC more than any other OS's sandboxing capabilities), but I'm not sure how much we care about macOS (aside from supporting it as a development environment).

[^2]: Preventing them pretty much either requires process isolation -- although you can get close if you prevent access to timers and other sources of non-determinism. Given that accessing timestamps seems important for a database language, I suspect nobody does this.

thomcc avatar Oct 06 '22 17:10 thomcc

From some offline discussion: trying to protect from malicious actors in cases like this is not (currently) within the scope of this project (or issue), so this is fine. Maybe someday it will be revisited, but it's an impractical amount of work at the moment, and not needed.

thomcc avatar Oct 06 '22 22:10 thomcc