pgapp
pgapp copied to clipboard
pgapp:with_transaction() does not work as expected
The README file contains this example:
pgapp:with_transaction(fun() ->
pgapp:squery("update ..."),
pgapp:squery("delete from ..."),
pgapp:equery("select ? from ?", ["*", Table])
end).
The problem in this approach is that "BEGIN" and "COMMIT" actually happens in one connection, but all the queries inside happens in another.
Look at the following code of pgapp_worker.erl:
equery(PoolName, Sql, Params, Timeout) ->
middle_man_transaction(PoolName,
fun (W) ->
gen_server:call(W, {equery, Sql, Params},
Timeout)
end, Timeout).
...
with_transaction(PoolName, Fun, Timeout) ->
middle_man_transaction(PoolName,
fun (W) ->
gen_server:call(W, {transaction, Fun},
Timeout)
end, Timeout).
middle_man_transaction()
will dedicate a separate poolboy worker for each query, including a transaction itself. So when with_transaction()
is called, it gets a poolboy worker and runs "BEGIN" in it (inside epsql.erl). Thus the worker is busy waiting for the queries to be executed. But the queries run as separate calls to poolboy which means they get different workers.
Ah, now I see. There is a trick:
squery(Sql) ->
case get(?STATE_VAR) of
undefined ->
squery(epgsql_pool, Sql);
Conn ->
epgsql:squery(Conn, Sql)
end.
But the trick only does its work when you do not specify PoolName explicitly. So this works:
pgapp:with_transaction(fun() ->
pgapp:squery("update ..."),
pgapp:squery("delete from ..."),
pgapp:equery("select ? from ?", ["*", Table])
end).
But this doesn't:
pgapp:with_transaction(pool1, fun() ->
pgapp:squery(pool1, "update ..."),
pgapp:squery(pool1, "delete from ..."),
pgapp:equery(pool1, "select ? from ?", ["*", Table])
end).
I think @edhollandAL wrote some of this code - have time to take a look?
Just made a fix for the issue. Please have a look.
Thinking out loud: rather than using the process dictionary, is there a way to, say, pass the Conn
into the fun()
or something?
@lukyanov it's intended behaviour, You shouldn't be able to specify Pool
(and Timeout
) in squery/equery
functions that are running inside with_transaction
, so yes, in your second example:
pgapp:with_transaction(pool1, fun() ->
pgapp:squery(pool1, "update ..."),
pgapp:squery(pool1, "delete from ..."),
pgapp:equery(pool1, "select ? from ?", ["*", Table])
end).
Functions inside fun() -> ... end
will be executed outside of transaction scope.
I agree that README is not comprehensive enough and I suggest that it should be fixed instead of current behaviour.
Ok, I see now. But then I would agree with @davidw that passing Conn
would be a better approach here. One of the problems in the current approach may be that it is not possible to have nested transactions.
doing_stuff1/0
below would not work as a transaction:
doing_stuff1() ->
pgapp:with_transaction(pool1, fun() ->
pgapp:squery("update ..."),
doing_stuff2()
end).
doing_stuff2() ->
pgapp:with_transaction(pool1, fun() ->
pgapp:squery("update ..."),
end).
You will need explicit communication about the connection you are using.
New variant of the changes: https://github.com/epgsql/pgapp/pull/23
@lukyanov: postgres does not have true subtransactions feature - so the example code you show would not work as intended anyway
I like the new code without the process dictionary - it feels cleaner in any case. Other thoughts?
I made another PR which is an alternative for #23: https://github.com/epgsql/pgapp/pull/24
@davidw While I agree with you that operating directly with the connection looks clearer and more comprehensible, using kind of 'dirty' approach with process dictionary has the following advantages:
-
It does not break backwards compatibility of the library (otherwise all who are using
pgapp
would need to rewrite there code aroundwith_transaction
calls when updated). -
The usage of
with_transaction
is more handful when you are not thinking in terms of connections. It hides unnecessary details. Imagine the following code:register_user() -> pgapp:with_transaction(fun() -> users:create_user(), stats:update_stats() end).
users:create_user()
andstats:update_stats()
may be independent functions which, in a different context, are called separately. Here they need to be called together inside a transaction. If you are forced to work with the connection directly, you would need to pass it to those functions, which may break the abstraction you intended to have when creating the functions.
Thanks for the updated PR @lukyanov, I agree that the benefits you cite above are important and should be maintained. The register_user
example makes a compelling case for supporting the pseudo-nested transactions. I'm less clear on the use case for supporting the cross-connection transactions.
If we allow this it is possible (likely) a user will attempt to create some sort of distributed transaction mechanism on top of pgapp; without proper two phase commit (which is only available as postgres extension) any such implementation would bound to be unreliable. I'm not sure that's something the project should be willing to support
@edholland Thanks for the response!
Let me be more clear on the second point which you mention as cross-connection transactions.
My point here is not to support complex multi-connection transactions. It is rather to make pgapp
aware of the same connection inside the with_transaction
callback. When you call with_transaction
on the pool pool1
and use pool1
inside the callback, I want pgapp
to assume that your intention is to use the same connection.
By the way, the example with register_user
is also very good to support my point if we consider the app which uses multiple connection pools. Let me extend the example:
register_user() ->
pgapp:with_transaction(pool1, fun() ->
users:create_user(),
stats:update_stats()
end).
% module users.erl
create_user() ->
pgapp:with_transaction(pool1, fun() ->
pgapp:squery("insert ..."),
pgapp:squery("update ...")
end).
% module stats.erl
update_stats() ->
pgapp:squery(pool1, "update ...").
So, my app uses different databases so I need a connection pool for each database. This means I must specify a pool I want to use every time I do a query or start a transaction. In the example above all the queries are for pool1
. Here is the point. users:create_user()
and stats:update_stats()
are dedicated functions which may be called separately in a different context. So we cannot omit pool1
as an argument in pgapp:with_transaction
and pgapp:squery
. But here they are called within a single transaction. The equivalent query sequence would look like this:
register_user() ->
pgapp:with_transaction(pool1, fun() ->
pgapp:squery("insert ..."),
pgapp:squery("update ...")
pgapp:squery(pool1, "update ...").
end).
In the current pgapp
code this approach won't work because the last query contains pool1
as a first argument. pgapp
will use a separate poolboy worker and the transaction will only contain first two queries.
P.S.
If we allow this it is possible (likely) a user will attempt to create some sort of distributed transaction mechanism on top of pgapp
We cannot stop the user doing something like this anyway, even in the current pgapp
version. Anyone is and was able to specify a pool name inside the callback.
In your examples you're using pool1
for all queries, if this is behaviour required then you can just omit the poolname. Assuming that the inner pool should be pool2
, as below, then it's possible that the update on pool2 succeeded while the sql on pool1 is rollback. We currently don't support this behaviour by serialising all calls within a transaction to a single connection
register_user() ->
pgapp:with_transaction(pool1, fun() ->
pgapp:squery("insert ..."),
pgapp:squery("update ...")
pgapp:squery(pool2, "update ...").
end).
@edholland For the same reason it is ok to allow to specify timeout inside the callback. It must be ignored by pgapp
, that's true, but it should not be prohibited as the same function may work differently depending on whether it was called inside the transaction or as standalone.
@edholland My example is correct. All queries are using pool1
. That is the intention. And I cannot ommit pool1
, because update_stats()
is a standalone function in a different module.
I'll have to think about this some, but I do think that, given the small number of users, if we need to change the API to improve the code, it's best to do so.
Couldn't this issue be simplified by not running the queries in a worker? Instead use the pool only to checkout a connection, use it and return it? This would also help solve #20 .