Design sketch for potential new {alloc,memcxt,drop,borrow} abstractions
This issue serves as a public aggregation of my thoughts regarding, and sketchpad for possible new designs of, how we handle the allocations that flow between Rust and Postgres and back again. Many of these ideas are half-formed so I do not intend to respond much to nits here: if things seem weird, unrelated, or inconsistent, it's more likely they have a missing part that I haven't written yet that would fully elaborate them.
- A major desirable that we can now start looking at: Allocations with lifetimes correctly bound by the memory context that they are allocated in, for at least some memory contexts. Kinda like a hypothetical
Box<T, A = Palloc<'current>>. I think before GATs this was going to quickly run up against lifetime issues best handled by GATs, but now I think we might be able to get past that. -
PgBoxprobably needs some rework to begin with, see https://github.com/tcdi/pgx/issues/533 -
WhoAllocated<T>is a pretty complex type parameter to have to handle and given that Postgres memory contexts are inherently dynamic, the benefits in soundness of handling it statically instead of dynamically are more limited if they don't include lifetimes. So I have an idea for how to reduce the frequency of dealing with it viaPow<'a, B>when a lifetime parameter wouldn't be too much to add. - Abstractions like
PallocSliceindicate we need a set of types with a more... flexible understanding of memory than Rust usually has, as our memory from Postgres may be actually uninit: https://github.com/tcdi/pgx/pull/882 - I'd like to better and more lazily handle the unboxing of Datum types so we aren't allocating so often, at least in some cases: https://github.com/tcdi/pgx/issues/660
- See the entire... saga of https://github.com/tcdi/pgx/issues/627, our abstractions need to be more resilient against layout mismatches between the two languages we're bridging.
- https://github.com/tcdi/pgx/issues/848 may matter? Probably not. Keeping this here so it's on this radar.
- Postgres has its own limitations for how big a size it can handle of things, and it would be nice if our various type conversions and such checked that we weren't breaking that automatically and in a consistent way, see e.g. https://github.com/tcdi/pgx/issues/654
Pow<'a, B>
In std::borrow Rust has
pub enum Cow<'a, B>
where
B: 'a + ToOwned + ?Sized,
{
Borrowed(&'a B),
Owned(<B as ToOwned>::Owned),
}
We probably want something like
pub enum Pow<'a, B>
where
B: 'a + ToRust + ?Sized,
{
Palloc(&'a B),
Rust(<B as ToRust>::Owned),
}
In some cases this may allow us to reduce the number of type parameters we use, such as everything that is parameterized by the WhoAllocated<A> abstraction, when matching on the enum isn't going to be a significant overhead.
We should also have abstractions that handle all of the needs_pfree cases we're doing "manually" in Rust here: these all should use a similar abstraction that's more like
pub enum MaybePalloc<'a, P>
where
P: 'a + ?Sized,
{
Borrowed(&'a P),
Palloc(NonNull<P>),
}
Maybe that should be Pow and the other one should be Row, eh? :P
These PRs have illustrated that we probably want to find a way to use a pattern to prevent Drop from interacting poorly with OwnedMemoryContext.
- https://github.com/tcdi/pgx/pull/921
- https://github.com/tcdi/pgx/pull/922
We also may want to find a way to further improve our panic handling with respect to memory contexts:
- https://github.com/tcdi/pgx/pull/920
I can't comment on the best way to implement things internally, but I can tell you what would make life easier.
#[repr(C)]
struct Foo {
bar: String,
}
#[pg_guard]
pub extern "C" fn some_pg_func(var: pg_sys::SomePgStruct) -> Foo {
let x = var.thing;
let foo = alloc_in_current_mem_context::<Foo>();
return foo; // no into_pg() required
}
If the foo.bar String got extended, the new memory would automatically be allocated in the same mem context.
I understand that doing it exactly as I've described is probably impossible, but the more we can get away from PgBox, from_pg(), into_pg(), and tracking whether something is in Rust or pg memory, the better.
Postgres manages memory lifetimes really well. Allocating everything in the current transaction memory context and then dropping it all at the end of the transaction covers a host of sins. I see no point in genuflecting before the Rust borrow checker and trying to get lifetimes exactly right. Everything allocated in pg memory has a static lifetime and never gets dropped.
Not all extensions are entirely described by starting, executing a transaction, and then stopping. So speaking of static lifetimes, it is statics, actually, that are are one of my primary concerns. The issue is inevitably going to come down to something like this:
use once_cell::sync::Lazy;
// This allows interior mutability after initialization
// so it can be written by transaction N and read by transaction N+1
static GLOBAL: Lazy<Palloc<SomeType>> = Lazy::new(|| Palloc::move_in(SomeType {}));
struct SomeType {}
struct Palloc<T> {
// Pretend this is a real type describing a palloc
inner: T,
}
impl<T> Palloc<T> {
fn move_in(inner: T) -> Self { Palloc { inner } }
}
"How do we design the API so that anything that gets up here must be in something like TopMemoryContext?"
In other words, 'static need only remain alive for the entire lifetime of the program, but PGX cannot assume that the lifetime of the transaction is also the lifetime of the extension as a program. I do of course want to improve ergonomics here and lean into what Postgres offers rather than away from it, but this is a nontrivial concern. Artificially limiting PGX to only being useful to create a certain "well-behaved" suite of extensions would be... unfortunate.
You might consider implementing an API that looks like existing Rust arena allocators. Bumpalo is a good one. It's very intuitive. https://github.com/fitzgen/bumpalo