libpqxx icon indicating copy to clipboard operation
libpqxx copied to clipboard

Why is there no transaction begin?

Open jmafc opened this issue 2 years ago • 12 comments

I have just converted a small program from libpq to libpqxx, but although the new code in general is much easier to read, the transaction and connection handling have become unnecessarily convoluted. Maybe I'm missing something, but the question above seems rather obvious and I expected that the documentation or the FAQ would at least cover this seemingly glaring ommission.

Let me present my libpq code in pseudocode and then my awkward solution.

PQconnectdb();
if (some_arg) {
    PQexec("BEGIN");
    PQexec("SELECT one_value");
    PQexecParams("UPDATE rows based on one_value");
    PQexec("COMMIT");
}
PQexec("SELECT id, value ... multiple rows FOR UPDATE ...");
for (... PQntuples ...) {
    PQgetvalue(...);
    // process row values
    PQexec("BEGIN");
    PQexecParams("UPDATE same table SET 3 cols WHERE id=...");
    PQexec("COMMIT");
}

"Equivalent" code using libqxx.

pqxx::connection db;
if (some_arg) {
    pqxx::work tx {db}; // this could *not* be declared before the `if`
                        // because apparently the commit makes it unusable
    val = tx.query_value("SELECT");
    tx.exec_params("UPDATE rows based on val");
    tx.commit();
}
pqxx::nontransaction tx {db};
pqxx::connection updt_db; // kludge, a open second connection
// the FOR UPDATE was left in, but it is rather meaningless
for (tx.query("SELECT id, value ... multiple rows FOR UPDATE") {
    // process row values
    try {
        pqxx::work tx2 {updt_db};
        tx2.exec_params("UPDATE same table SET 3 cols WHERE id=...");
        tx2.commit();
    } catch {}
}

Is there are a better way to do this, specifically to do it within a single connection so that the rows are locked during the processing?

jmafc avatar Aug 02 '22 21:08 jmafc

That'll work fine with a single connection. The only complication I see is that you want a transaction boundary between the query at the head of the loop, and the loop body.

To do that, just store the result of the query in a variable, then end the nontransaction you used at the head of the loop:

pqxx::nontransaction tx {db};
auto rows = tx.query("SELECT id, value ... multiple rows FOR UPDATE");
tx.commit();

for (auto [...] : rows)
{
  // Loop body here.
}

jtv avatar Aug 03 '22 09:08 jtv

OK, that's a little bit better, but it brings up other questions. For example, I presume that storing the result in a variable means all the rows have to be transferred from the server to the application program before we enter the loop, which constrains memory resources in the client (and could become prohibitive if the query retrieved millions of rows). Also, I assume the tx.commit() is required and implies that the server receives a COMMIT, releases the locks from the FOR UPDATE, and thus the updates within the for loop are "unprotected". The question in the subject line still remains open ... On StackOveflow, on a similar question by somebody else, someone suggested that one could do one's own transaction handling using a pqxx::nontransaction and issuing pqxx::exec0("BEGIN") and pqxx::exec0("COMMIT") to delimit transaction boundaries. On one hand, that sounds like a more reasonable solution (except for the incongruent use of a "non-transaction" to effect proper transaction handling). On the other hand, it makes me wonder about the usefulness of the transaction abstractions in libpqxx. As a side comment, although I know I can read the code, it may be helpful if the docs explained some of what is going on under the covers, e.g., a work starts off by doing a BEGIN, a nontransaction does ... whatever.

jmafc avatar Aug 03 '22 15:08 jmafc

Yes, query() downloads all the rows and stores them in memory, but that's got nothing to do with storing the result in a variable. That's just the documented behaviour of the "query" functions. You can choose a different behaviour by using stream() instead of query(), but in that case, you'll be doing the loop while the query is still ongoing, which means that you do need a second connection.

Before we go further into the details though, perhaps you should state more clearly what it is that you want to do. You are complaining about losing the FOR UPDATE lock, but doesn't your libpq pseudocode do the same thing? Those locks are gone by the time you start your loop.

jtv avatar Aug 04 '22 07:08 jtv

Why would the locks be gone? There is no commit after the PQexec("SELECT ...") and the for. Since the code isn't using CURSORs and FETCHing perhaps the COMMIT at the bottom of the loop does release the locks after the first row is processed (I haven't checked this).

jmafc avatar Aug 04 '22 12:08 jmafc

AFAICT the "SELECT ... FOR UPDATE" runs outside of any transaction. Which means that that statement runs as a transaction of its own, which commits immediately when the statement completes.

jtv avatar Aug 04 '22 19:08 jtv

Ok, I will take a look at that in more detail when I get a chance. In the meantime, however, why don't you answer the question posed in the subject?

jmafc avatar Aug 04 '22 19:08 jmafc

Can you say precisely why you would expect a "begin()" to help with your problem?

jtv avatar Aug 04 '22 20:08 jtv

I never said I had a problem. I only said that the lack of BEGIN was inconvenient (and also confusing because the documentation presents six transaction classes--plus a work mentioned elsewhere--without a clear overview) and that I couldn't find a rationale for excluding BEGIN when the underlying library does have it. Although my first relational DBMS was one that only had COMMIT and ROLLBACK, I've found that the BEGIN (or SQL:1999 START TRANSACTION) statement is more sensible than having just the "ending" statements.

jmafc avatar Aug 05 '22 03:08 jmafc

The transaction design uses RAII which is why there is no BEGIN. The transaction is automatically rolled back in the destructor unless the application commits it. You should create the transaction instance at the point where you want the transaction to start.

KayEss avatar Aug 05 '22 12:08 KayEss

The issue IMHO is that the transaction as a "destructible entlity at scope" doesn't seem to play well with less than simple user transactions. Sometimes it's OK, e.g., when declaring the transaction inside the if, but when you have something more complex like around the for loop, you essentially have to resort to a *non*-transaction and handle the BEGIN/COMMIT/ABORT's separately.

jmafc avatar Aug 05 '22 13:08 jmafc

I think it's far more important that the usual case is simple. You can use a std::optional if you need to separate the scope of the transaction variable from the creation and destruction points. If you need more flexibility you can also heap allocate it. A transaction variable can also be reset by assigning over it. The language provides you with the tools you need for more complex use cases.

KayEss avatar Aug 05 '22 13:08 KayEss

Thanks, that's more sensible. However, it would help if some of your points were mentioned in discussing transactions in the docs, so other people don't have to go hunting for it.

jmafc avatar Aug 05 '22 13:08 jmafc

I tweaked the transaction docs a bit to make this clearer and mention RAII.

jtv avatar Sep 02 '22 22:09 jtv