postgresql-simple
postgresql-simple copied to clipboard
Atomic `withTransaction`?
Is there a recommended way to perform an atomic withTransaction
block? The following deadlocks:
withMVar (connectionHandle conn) $ \_ -> do
withTransaction conn -- deadlocks since `begin` needs to acquire the lock
...
The only other things I can think of are using an extra MVar
to synchronize on the Connection
, or somehow abusing unsafeIOToSTM
.. both of which seem prone to error.
Well, not really in vanilla postgresql-simple. Anything we could provide would amount to using an extra MVar
, which I have mixed feelings about. Then you have the additional complication that you'd want individual statement/queries to take the transaction lock if they aren't inside a transaction. Which is probably doable, but is more complicated than you might think. My current opinion is that these concerns are best left to a higher-level wrapper to deal with individual connection-level concurrency for a variety of reasons. (e.g. one of my projects manages to get away without needing to atomically issue multiple statements on multiple round trips...)
snaplet-postgresql-simple
does what you are asking for, which integrates with the connection pool it also provides.
I just bumped into this issue while looking for something else and am a little confused. Isn't the atomicity of transaction blocks guaranteed by PG itself?
@saurabhnanda the atomicity of transaction blocks is guaranteed. postgresql-simple
only guarantees that if exceptions are thrown in a withTransaction
block that the transaction will be rolled back.
However, multiple threads can be writing queries to one connection, and as far as PostgreSQL is concerned those are all the same execution context. For instance Thread 1:
withTransaction conn $ do
execute_ conn [sql|INSERT INTO table SELECT 1;|]
threadDelay 1000000
Thread 2:
rollback conn -- Rolls back SELECT 1 despite being in a different thread of execution.
Thread 3:
threadDelay 500000
query_ conn [sql|SELECT * FROM table|]
What the programmer meant:
BEGIN;
INSERT INTO table SELECT 1;
-- delay 1 second
COMMIT;
ROLLBACK;
-- delay 500ms
SELECT * FROM table;
What PostgreSQL sees:
BEGIN;
INSERT INTO table SELECT 1;
ROLLBACK;
-- delay 500ms
SELECT * FROM table;
-- delay another 500ms
COMMIT;
@charles-cooper so we have to make a new connection for every thread?
I ended up wrapping the Connection in an MVar. Another option is to use withResource from https://hackage.haskell.org/package/resource-pool-0.2.3.2/docs/Data-Pool.html.
To add to Charles's answer to Saurabh's question, postgresql ensures concurrency safety (for varying values of "safe") per transaction. postgresql-simple
cannot affect that. However, postgresql-simple
itself isn't entirely thread safe when using withTransaction
, basically, intervening concurrent queries can be inserted into the transaction when they shouldn't in almost all (all?) cases. So postgresql will see a transaction as something other than what was probably intended.
To add to Charles's answer to Robin's question, postgresql-simple is thread safe in a looser sense; if you issue queries from multiple threads and the connection is busy, each query will get into line and wait until all the queries ahead of it have finished. This doesn't extend to withTransaction
, though, as basically the withTransaction
wants to issue multiple queries without going back to the end of the line after each query completes, but postgresql-simple doesn't support this at the present time. This probably could be solved with a more elaborate type of locking mechanism, which I might be amenable to, but there are a couple of slightly tricky aspects to this approach.
@lpsmith I more or less agree with you that supporting a higher level of thread safety on withTransaction
is complicated. Not least of which, after implementing a locking mechanism, the user experience might be equally jarring to find that running withTransaction
blocks all their other operations on the connection.
Therefore it seems like good design to delegate this to higher-level libraries.
That being said, I do think the documentation should clarify the exact behavior of withTransaction
as it currently stands, because the documentation underspecifies withTransaction
's thread safety. Because of postgresql-simple
's high importance and impact to the community - and this is extra credit - I think it would be better still if postgresql-simple
additionally published 'official guidance' regarding how to handle thread safety (one connection per thread and offload transactionality to PostgreSQL, use a resource pool, or something else?).
In any case, if it seems appropriate I can submit a PR for the documentation piece and that should resolve this issue.
Yeah, a documentation improvement should be fine.
I actually had a thought that occurred to me shortly after writing my previous response, which I should write down now before I forget it. I think I now know how to implement a suitable locking mechanism in a way I'm satisfied with.
Basically, we change the type of withTransaction
from :: Connection -> IO a -> IO a
to :: Connection -> (Connection -> IO a) -> IO a
.
The way you'd use it is like
withTransaction c $ \c' -> do
query c' ...
execute c' ...
...
Conceptually, withTransaction
would take the MVar
lock, and then (functionally!) replace the MVar
in c
, with a fresh MVar
, creating c'
. Attempting to use c
while the transaction is in progress would block; attempting to use c'
would have the same semantics as postgresql-simple currently has. Personally, I'm not opposed to name shadowing, and pretty strongly disagree with people who are strongly opposed to name shadowing; I think it could be pretty idiomatic to write withTransaction c $ \c -> ...
. Upon exit, withTransaction
would mark c'
as "transaction finished", which would conceptually be very similar to a "closed" connection, and then release the original MVar
lock so that c
becomes usable again.
The nice thing about this is you could even use c'
from multiple threads, in the odd case that might actually make sense. And by tweaking the conceptual implementation a bit more, you could also solve some of the issues brought up in #179, namely, the possibility having withTransaction
(and possibly several variants) deal with "nested" transactions more gracefully, either by throwing an exception, ignoring the nested transaction, or using savepoints to emulate nested transactions. It might be useful to do some isolation level checking in some of these variants, as well.
Hmm. I quite like name shadowing but in the case that somebody chooses not to name shadow, doesn't that open up a not-very-subtle possibility of deadlocking? E.g.
withTransaction c $ \c' -> do
query c' ...
execute c ... -- whoops! darn it i hate when i leave my 0x27s at home ...
So I wonder if there is a neat way of using a phantom type / changing the execution context of withTransaction to reflect that, e.g.
withTransaction :: Connection Unlocked -> (Connection LockedDownForBusiness -> IO a) -> IO a
and then requiring two versions of every query/execute function -
-- (I'm sure the same effect could be achieved with typeclasses)
queryUnlocked :: Connection Unlocked -> q -> IO stuff
queryLocked :: Connection LockedDownForBusiness -> q -> IO stuff
Although now that I think through my suggestion I'm not really sure what it buys over the 'dumb' way either calling withTransaction (\conn -> query conn...)
or withConnectionAtomically (\conn -> query conn ...)
where withConnectionAtomically
has the same concurrency behavior as withTransaction
it just doesn't create a transaction PG-side. If a user really wanted to use the connection from multiple threads, variants like queryButItsNotThreadSafe
could be provided which use tryReadMVar/tryTakeMVar
under the hood.
Yep, you are absolutely correct that this sort of mistake would pretty easily lead to deadlock, and I suspect that (without the use of name shadowing) this would be a fairly common sort of mistake.
(Personally, I try to use name shadowing intentionally, and somewhat idiomatically, when it makes sense. I certainly am opposed to the undisciplined use of name shadowing. Using it to document/ensure that a given variable should not be referred to in a given block is an example of when I use name shadowing, which this is a good example of when I'd shadow. But I digress.)
On the other hand, is this (somewhat advanced) issue really worth polluting a (much more basic) interface, and imposing intellectual overhead upon more novice users? I mean, everything else being equal, absolute safety (in the Rust sense of safety) is certainly preferable to provisional safety (i.e. "safety" in languages other than Rust, and to a lesser extent languages other than Haskell, ML, Scheme, Lisp, etc.) However, achieving absolute safety can be difficult and "expensive", whereas provisional safety can at times be a reasonable fallback.
(As another digression, this is very similar to Haskell's let x = x + 1
let-scoping issue; I made this sort of mistake fairly often during my first 5-10 years of Haskell programming, and it was often hard to spot, whereas these days I rarely make this sort of mistake, and if I do, it's relatively easy to spot.)
I guess in the end, I can agree that the current semantics of withTransaction
needs to be fixed, whereas I don't know if this newer problem my proposed fix would introduce is really worth fretting over at the present time.
Generally speaking, sharing the same Connection
across multiple threads should not the recommended approach. The recommended approach should be to use a connection-pool with each thread getting a new connection.
For more advanced use-cases, connection-sharing, across threads can be added in a cookbook, of sorts.
The semantics of withTransaction
, and withConnection
should be clarified in the docs, along with an example of using connection pooling.
Generally speaking, sharing the same Connection across multiple threads should not the recommended approach. The recommended approach should be to use a connection-pool with each thread getting a new connection.
That statement isn't overly self-consistent. In that case, you are sharing connections across threads, you are just using the connection pool to mediate the concurrency issues. And in the common sort of use case of a website back end, it's often neither necessary nor desirable to create a new connection for each thread (which is typically created to serve one request), or to reserve a single connection for the duration of the entire request.
(e.g. see withPG
and withTransaction
in snaplet-postgresql-simple
; otherwise every database action in a single thread, can potentially go out on a different connection, and single connections can bounce around between threads.)
Now, while I agree that using a single connection from multiple threads without any other sort of concurrency mediation is a bit unusual, but it is occasionally useful, especially if you aren't dealing with something that isn't (directly) an http backend. And the postgresql-simple
philosophy is to generally avoid imposing opinions, but rather allow the widest possible range of use cases.
That statement isn't overly self-consistent
Allow me to rephrase. Sharing a connection concurrently across threads should not be recommended. If sharing connections across threads in a non-concurrent manner is required (most common use-case), then connection pooling is a tried and tested solution, and it should be recommended.
... but it is occasionally useful... ... philosophy is to generally avoid imposing opinions...
"make simple things simple, complex things possible"
I ran into a issue probably related to this. I'm using postgres-simple
together with beam-postgres
in a http server. When serving concurrent requests I was some times getting errors that looked like this (with different temp numbers):
SqlError {sqlState = "34000", sqlExecStatus = FatalError, sqlErrorMsg = "cursor \"temp20\" does not exist", sqlErrorDetail = "", sqlErrorHint = ""}
As a novice in this space it was not obvious to me that I couldn't just create a connection when starting the http server and share it between requests. Creating a new connection for each request solved the issue.
I think a warning should be added that it is not safe to use a connection concurrently.
But it is safe to use a connection concurrently; there just isn't any provision at the present time in cases when you want to issue multiple queries atomically (as is involved with things such as e.g. fold
and withTransaction
) without going back to the end of the line.
In your case, you probably want to investigate connection pooling, either via pgbouncer (which would change the deployment of your code, but not the code itself) or tweak your code to add pooling internally to your application.