pgrx
pgrx copied to clipboard
Add support for Spi cursors
This extends the SpiClient interface to expose methods to open a cursor based on a query, returning a SpiCursor that can be used to fetch results incrementally. Opened cursors can be closed early, or left open and turned into their (portal) name, to be later retrieved in a different Spi session in the same transaction.
I've tried to keep the API consistent with the existing Spi API, although I have a couple of doubts, specifically:
- I find signed row counts not very intuitive, although I understand they are consistent with the underlying C API (e.g. see
SpiClient::selectlimitparameter); that might be addressed in the future Spi API overhaul mentioned on Discord, I believe? - There is no way to free the memory associated to returned
SpiTupleTableData; this is more of an issue with cursors as there are use cases where several fetch operations are performed in the same Spi session.
I've integrated your changes, with some tweaks as I found other things that could be cleared up. Let me know what you think. Thanks for the feedback, @Hoverbear .
This has bitrotted a little bit. @EdMcBane, are you interested in bringing it up-to-date? We can get it merged soon...
Sure thing, @eeeebbbbrrrr, I'll give it a go in the next few days.
Should be good to go.
I haven't checked out this branch to play with it, but can a cursor escape an Spi::connect() block? Like, does this compile:
let cursor = Spi::connect(|mut client| {
let cursor = client.open_cursor("SELECT 1", None);
return Ok(Some(cursor));
});
If so, that seems dangerous. The underlying cursor pointer will be free'd at the end of Spi::connect(), so further use of the returned cursor would be UAF.
Also, I wonder if this PR needs to be rebased again? We've had some other API changes around SPI and I dunno if this has them or not.
I would like to get this merged tho. @EdMcBane, you've been working at this for awhile now and I feel bad we haven't given it much attention. :(
I haven't checked out this branch to play with it, but can a cursor escape an Spi::connect() block? Like, does this compile:
let cursor = Spi::connect(|mut client| { let cursor = client.open_cursor("SELECT 1", None); return Ok(Some(cursor)); });
Good catch, it used to compile, now it does not, I've made sure to phantom-borrow client to make it so.
Also, I wonder if this PR needs to be rebased again? We've had some other API changes around SPI and I dunno if this has them or not.
Just did it; the above fix needed to be adapted, but the rest rebased cleanly.
I would like to get this merged tho. @EdMcBane, you've been working at this for awhile now and I feel bad we haven't given it much attention. :(
Nothing to worry about :-) Cheers
yeah, nice. Only question I have now is... does SpiCursor need to impl Drop instead of having an explicit close() method? I'm leaning towards yes, but I imagine you put some thought into having close(), so I want to ask...
SpiCursor can implement Drop instead of close(), as long as we use a sentinel value (null_ptr, or Option::None) to skip calling SPI_cursor_close when dropping it within SpiCursor::into_name.
I kinda think it would be best.
SpiCursorcan implementDropinstead ofclose(), as long as we use a sentinel value (null_ptr, or Option::None) to skip callingSPI_cursor_closewhen dropping it withinSpiCursor::into_name. I kinda think it would be best.
Makes sense. I missed the into_name() function. Is that even necessary as part of the API? It looks a lot like leaking the cursor (waiting for Postgres to clean it up later).
Is it meant to be used for "WITH HOLD" cursors that can survive transactions?
into_name is meant to allow leaving a cursor open within the same transaction across multiple Spi sessions.
I was not aware of "WITH HOLD" cursors, interesting.
into_nameis meant to allow leaving a cursor open within the same transaction across multiple Spi sessions.
Isn't this just std::mem::forget()? I think if you make SpiCursor #[repr(transparent)] then that wouldn't ever leak Rust-allocated memory, and Postgres would clean up at the end of the transaction, and the find_cursor() function will still work.
I was not aware of "WITH HOLD" cursors, interesting.
Yeah, that just popped into my head right now. Incredibly useful feature -- I've used it a lot over the years. I don't think we need to tackle it with this PR tho. File it away for the future.
Isn't this just
std::mem::forget()? I think if you makeSpiCursor#[repr(transparent)]then that wouldn't ever leak Rust-allocated memory, and Postgres would clean up at the end of the transaction, and thefind_cursor()function will still work.
I'm wondering wether #[repr(transparent)] is actually necessary, as into_name takes self by value, thus moving it on the stack.
I'll replace close() with drop() and get back to you.
I'm wondering wether #[repr(transparent)] is actually necessary, as into_name takes self by value, thus moving it on the stack.
True. Maybe it'll just be good to declare our intentions. Also, I guess now the pointer can be NonNull<pg_sys::PortalData> now?
This has turned out really nice. Great work and thanks for sticking with it and thanks for your patience.
Would you mind adding some // SAFETY comments to the few unsafe blocks?
Then I’ll merge.
Glad to hear it, will add // SAFETY comments later today.
I've noticed that SpiTupleTable doesn't ever early free memory via SPI_freetuptable, relying on the memory context being cleaned at Spi session completion. This makes cursors use a little bit awkward, as explained in my code:
/// # Important notes about memory usage
/// Result sets ([`SpiTupleTable`]s) returned by [`SpiCursor::fetch()`] will not be freed until
/// the current Spi session is complete;
/// this is a Pgx limitation that might get lifted in the future.
///
/// In the meantime, if you're using cursors to limit memory usage, make sure to use
/// multiple separate Spi sessions, retrieving the cursor by name.
Solving this is a bit involved, as it requires a Drop implementation on SpiTupleTable and to create a lifetime dependency between SpiHeapTupleData/SpiHeapTupleDataEntry and their parent SpiTupleTable.
Should we address this now or should I open a new issue for it?
Solving this is a bit involved, as it requires a
Dropimplementation onSpiTupleTableand to create a lifetime dependency betweenSpiHeapTupleData/SpiHeapTupleDataEntryand their parentSpiTupleTable. Should we address this now or should I open a new issue for it?
After taking a stab at it, it is even more involved than I thought. SpiTupleTable is an Iterator and holds status for the current line; an Iterator cannot borrow from itself, as such the Iterator implementation and all mutable state would need to go into some SpiTupleTableIterator that borrows from SpiTupleTable, but this:
- is a breaking API change
- conflicts with all data access methods on
SpiTupleTablesuch asone(),get_datum(),get_heap_tuplewhich rely on theSpiTupleTableholding iteration status.
I'm thinking this would be worthwhile, but definitely farther reaching and in a separate issue / PR.
Solving this is a bit involved, as it requires a
Dropimplementation onSpiTupleTableand to create a lifetime dependency betweenSpiHeapTupleData/SpiHeapTupleDataEntryand their parentSpiTupleTable. Should we address this now or should I open a new issue for it?After taking a stab at it, it is even more involved than I thought.
SpiTupleTableis anIteratorand holds status for the current line; an Iterator cannot borrow from itself, as such theIteratorimplementation and all mutable state would need to go into someSpiTupleTableIteratorthat borrows fromSpiTupleTable, but this:
- is a breaking API change
- conflicts with all data access methods on
SpiTupleTablesuch asone(),get_datum(),get_heap_tuplewhich rely on theSpiTupleTableholding iteration status.I'm thinking this would be worthwhile, but definitely farther reaching and in a separate issue / PR.
If you want to make a new issue about this we'll add it to the list. @yrashk has graced us with a handful of other SPI improvement PRs, and it sounds like attacking this would want to land after those.
In the mean time, I'm merging this PR now. Thanks again!