pgrx Add support for Spi cursors

This extends the SpiClient interface to expose methods to open a cursor based on a query, returning a SpiCursor that can be used to fetch results incrementally. Opened cursors can be closed early, or left open and turned into their (portal) name, to be later retrieved in a different Spi session in the same transaction.

May 28 '22 12:05 EdMcBane

I've tried to keep the API consistent with the existing Spi API, although I have a couple of doubts, specifically:

I find signed row counts not very intuitive, although I understand they are consistent with the underlying C API (e.g. see SpiClient::select limit parameter); that might be addressed in the future Spi API overhaul mentioned on Discord, I believe?
There is no way to free the memory associated to returned SpiTupleTableData; this is more of an issue with cursors as there are use cases where several fetch operations are performed in the same Spi session.

May 28 '22 13:05 EdMcBane

I've integrated your changes, with some tweaks as I found other things that could be cleared up. Let me know what you think. Thanks for the feedback, @Hoverbear .

Jun 02 '22 17:06 EdMcBane

This has bitrotted a little bit. @EdMcBane, are you interested in bringing it up-to-date? We can get it merged soon...

Nov 18 '22 19:11 eeeebbbbrrrr

Sure thing, @eeeebbbbrrrr, I'll give it a go in the next few days.

Nov 18 '22 22:11 EdMcBane

Should be good to go.

Nov 19 '22 01:11 EdMcBane

I haven't checked out this branch to play with it, but can a cursor escape an Spi::connect() block? Like, does this compile:

         let cursor = Spi::connect(|mut client| {
             let cursor = client.open_cursor("SELECT 1", None);
             return Ok(Some(cursor));
         });

If so, that seems dangerous. The underlying cursor pointer will be free'd at the end of Spi::connect(), so further use of the returned cursor would be UAF.

Also, I wonder if this PR needs to be rebased again? We've had some other API changes around SPI and I dunno if this has them or not.

I would like to get this merged tho. @EdMcBane, you've been working at this for awhile now and I feel bad we haven't given it much attention. :(

Dec 07 '22 15:12 eeeebbbbrrrr

I haven't checked out this branch to play with it, but can a cursor escape an Spi::connect() block? Like, does this compile:
         let cursor = Spi::connect(|mut client| {
             let cursor = client.open_cursor("SELECT 1", None);
             return Ok(Some(cursor));
         });

Good catch, it used to compile, now it does not, I've made sure to phantom-borrow client to make it so.

Also, I wonder if this PR needs to be rebased again? We've had some other API changes around SPI and I dunno if this has them or not.

Just did it; the above fix needed to be adapted, but the rest rebased cleanly.

I would like to get this merged tho. @EdMcBane, you've been working at this for awhile now and I feel bad we haven't given it much attention. :(

Nothing to worry about :-) Cheers

Dec 07 '22 17:12 EdMcBane

yeah, nice. Only question I have now is... does SpiCursor need to impl Drop instead of having an explicit close() method? I'm leaning towards yes, but I imagine you put some thought into having close(), so I want to ask...

Dec 07 '22 17:12 eeeebbbbrrrr

SpiCursor can implement Drop instead of close(), as long as we use a sentinel value (null_ptr, or Option::None) to skip calling SPI_cursor_close when dropping it within SpiCursor::into_name. I kinda think it would be best.

Dec 07 '22 18:12 EdMcBane

SpiCursor can implement Drop instead of close(), as long as we use a sentinel value (null_ptr, or Option::None) to skip calling SPI_cursor_close when dropping it within SpiCursor::into_name. I kinda think it would be best.

Makes sense. I missed the into_name() function. Is that even necessary as part of the API? It looks a lot like leaking the cursor (waiting for Postgres to clean it up later).

Is it meant to be used for "WITH HOLD" cursors that can survive transactions?

Dec 07 '22 18:12 eeeebbbbrrrr

into_name is meant to allow leaving a cursor open within the same transaction across multiple Spi sessions. I was not aware of "WITH HOLD" cursors, interesting.

Dec 07 '22 18:12 EdMcBane

into_name is meant to allow leaving a cursor open within the same transaction across multiple Spi sessions.

Isn't this just std::mem::forget()? I think if you make SpiCursor #[repr(transparent)] then that wouldn't ever leak Rust-allocated memory, and Postgres would clean up at the end of the transaction, and the find_cursor() function will still work.

I was not aware of "WITH HOLD" cursors, interesting.

Yeah, that just popped into my head right now. Incredibly useful feature -- I've used it a lot over the years. I don't think we need to tackle it with this PR tho. File it away for the future.

Dec 07 '22 18:12 eeeebbbbrrrr

Isn't this just std::mem::forget()? I think if you make SpiCursor #[repr(transparent)] then that wouldn't ever leak Rust-allocated memory, and Postgres would clean up at the end of the transaction, and the find_cursor() function will still work.

I'm wondering wether #[repr(transparent)] is actually necessary, as into_name takes self by value, thus moving it on the stack. I'll replace close() with drop() and get back to you.

Dec 07 '22 19:12 EdMcBane

I'm wondering wether #[repr(transparent)] is actually necessary, as into_name takes self by value, thus moving it on the stack.

True. Maybe it'll just be good to declare our intentions. Also, I guess now the pointer can be NonNull<pg_sys::PortalData> now?

Dec 07 '22 20:12 eeeebbbbrrrr

This has turned out really nice. Great work and thanks for sticking with it and thanks for your patience.

Would you mind adding some // SAFETY comments to the few unsafe blocks?

Then I’ll merge.

Dec 08 '22 04:12 eeeebbbbrrrr

Glad to hear it, will add // SAFETY comments later today.

I've noticed that SpiTupleTable doesn't ever early free memory via SPI_freetuptable, relying on the memory context being cleaned at Spi session completion. This makes cursors use a little bit awkward, as explained in my code:

/// # Important notes about memory usage
/// Result sets ([`SpiTupleTable`]s) returned by [`SpiCursor::fetch()`] will not be freed until
/// the current Spi session is complete;
/// this is a Pgx limitation that might get lifted in the future.
///
/// In the meantime, if you're using cursors to limit memory usage, make sure to use
/// multiple separate Spi sessions, retrieving the cursor by name.

Solving this is a bit involved, as it requires a Drop implementation on SpiTupleTable and to create a lifetime dependency between SpiHeapTupleData/SpiHeapTupleDataEntry and their parent SpiTupleTable. Should we address this now or should I open a new issue for it?

Dec 08 '22 09:12 EdMcBane

Solving this is a bit involved, as it requires a Drop implementation on SpiTupleTable and to create a lifetime dependency between SpiHeapTupleData/SpiHeapTupleDataEntry and their parent SpiTupleTable. Should we address this now or should I open a new issue for it?

After taking a stab at it, it is even more involved than I thought. SpiTupleTable is an Iterator and holds status for the current line; an Iterator cannot borrow from itself, as such the Iterator implementation and all mutable state would need to go into some SpiTupleTableIterator that borrows from SpiTupleTable, but this:

is a breaking API change
conflicts with all data access methods on SpiTupleTable such as one(), get_datum(), get_heap_tuple which rely on the SpiTupleTable holding iteration status.

I'm thinking this would be worthwhile, but definitely farther reaching and in a separate issue / PR.

Dec 08 '22 16:12 EdMcBane

Solving this is a bit involved, as it requires a Drop implementation on SpiTupleTable and to create a lifetime dependency between SpiHeapTupleData/SpiHeapTupleDataEntry and their parent SpiTupleTable. Should we address this now or should I open a new issue for it?

After taking a stab at it, it is even more involved than I thought. SpiTupleTable is an Iterator and holds status for the current line; an Iterator cannot borrow from itself, as such the Iterator implementation and all mutable state would need to go into some SpiTupleTableIterator that borrows from SpiTupleTable, but this:

is a breaking API change

conflicts with all data access methods on SpiTupleTable such as one(), get_datum(), get_heap_tuple which rely on the SpiTupleTable holding iteration status.

I'm thinking this would be worthwhile, but definitely farther reaching and in a separate issue / PR.

If you want to make a new issue about this we'll add it to the list. @yrashk has graced us with a handful of other SPI improvement PRs, and it sounds like attacking this would want to land after those.

In the mean time, I'm merging this PR now. Thanks again!

Dec 08 '22 16:12 eeeebbbbrrrr

pgrx pgrx copied to clipboard

Add support for Spi cursors

pgrx
pgrx copied to clipboard