postgresql-simple icon indicating copy to clipboard operation
postgresql-simple copied to clipboard

Atomic `withTransaction`?

Open charles-cooper opened this issue 8 years ago • 15 comments

Is there a recommended way to perform an atomic withTransaction block? The following deadlocks:

withMVar (connectionHandle conn) $ \_ -> do
  withTransaction conn -- deadlocks since `begin` needs to acquire the lock
  ...

The only other things I can think of are using an extra MVar to synchronize on the Connection, or somehow abusing unsafeIOToSTM .. both of which seem prone to error.

charles-cooper avatar Feb 02 '17 12:02 charles-cooper

Well, not really in vanilla postgresql-simple. Anything we could provide would amount to using an extra MVar, which I have mixed feelings about. Then you have the additional complication that you'd want individual statement/queries to take the transaction lock if they aren't inside a transaction. Which is probably doable, but is more complicated than you might think. My current opinion is that these concerns are best left to a higher-level wrapper to deal with individual connection-level concurrency for a variety of reasons. (e.g. one of my projects manages to get away without needing to atomically issue multiple statements on multiple round trips...)

snaplet-postgresql-simple does what you are asking for, which integrates with the connection pool it also provides.

lpsmith avatar Feb 04 '17 22:02 lpsmith

I just bumped into this issue while looking for something else and am a little confused. Isn't the atomicity of transaction blocks guaranteed by PG itself?

saurabhnanda avatar Jul 18 '17 04:07 saurabhnanda

@saurabhnanda the atomicity of transaction blocks is guaranteed. postgresql-simple only guarantees that if exceptions are thrown in a withTransaction block that the transaction will be rolled back.

However, multiple threads can be writing queries to one connection, and as far as PostgreSQL is concerned those are all the same execution context. For instance Thread 1:

withTransaction conn $ do
  execute_ conn [sql|INSERT INTO table SELECT 1;|]
  threadDelay 1000000

Thread 2:

rollback conn -- Rolls back SELECT 1 despite being in a different thread of execution.

Thread 3:

threadDelay 500000
query_ conn [sql|SELECT * FROM table|]

What the programmer meant:

BEGIN;
INSERT INTO table SELECT 1;
-- delay 1 second
COMMIT;
ROLLBACK;
-- delay 500ms
SELECT * FROM table;

What PostgreSQL sees:

BEGIN;
INSERT INTO table SELECT 1;
ROLLBACK;
-- delay 500ms
SELECT * FROM table;
-- delay another 500ms
COMMIT;

charles-cooper avatar Aug 31 '17 00:08 charles-cooper

@charles-cooper so we have to make a new connection for every thread?

robinvd avatar Nov 22 '17 20:11 robinvd

I ended up wrapping the Connection in an MVar. Another option is to use withResource from https://hackage.haskell.org/package/resource-pool-0.2.3.2/docs/Data-Pool.html.

charles-cooper avatar Nov 22 '17 20:11 charles-cooper

To add to Charles's answer to Saurabh's question, postgresql ensures concurrency safety (for varying values of "safe") per transaction. postgresql-simple cannot affect that. However, postgresql-simple itself isn't entirely thread safe when using withTransaction, basically, intervening concurrent queries can be inserted into the transaction when they shouldn't in almost all (all?) cases. So postgresql will see a transaction as something other than what was probably intended.

To add to Charles's answer to Robin's question, postgresql-simple is thread safe in a looser sense; if you issue queries from multiple threads and the connection is busy, each query will get into line and wait until all the queries ahead of it have finished. This doesn't extend to withTransaction, though, as basically the withTransaction wants to issue multiple queries without going back to the end of the line after each query completes, but postgresql-simple doesn't support this at the present time. This probably could be solved with a more elaborate type of locking mechanism, which I might be amenable to, but there are a couple of slightly tricky aspects to this approach.

lpsmith avatar Nov 23 '17 00:11 lpsmith

@lpsmith I more or less agree with you that supporting a higher level of thread safety on withTransaction is complicated. Not least of which, after implementing a locking mechanism, the user experience might be equally jarring to find that running withTransaction blocks all their other operations on the connection.

Therefore it seems like good design to delegate this to higher-level libraries.

That being said, I do think the documentation should clarify the exact behavior of withTransaction as it currently stands, because the documentation underspecifies withTransaction's thread safety. Because of postgresql-simple's high importance and impact to the community - and this is extra credit - I think it would be better still if postgresql-simple additionally published 'official guidance' regarding how to handle thread safety (one connection per thread and offload transactionality to PostgreSQL, use a resource pool, or something else?).

In any case, if it seems appropriate I can submit a PR for the documentation piece and that should resolve this issue.

charles-cooper avatar Dec 30 '17 00:12 charles-cooper

Yeah, a documentation improvement should be fine.

I actually had a thought that occurred to me shortly after writing my previous response, which I should write down now before I forget it. I think I now know how to implement a suitable locking mechanism in a way I'm satisfied with.

Basically, we change the type of withTransaction from :: Connection -> IO a -> IO a to :: Connection -> (Connection -> IO a) -> IO a.

The way you'd use it is like

withTransaction c $ \c' -> do
    query c' ...
    execute c' ...
    ...

Conceptually, withTransaction would take the MVar lock, and then (functionally!) replace the MVar in c, with a fresh MVar, creating c'. Attempting to use c while the transaction is in progress would block; attempting to use c' would have the same semantics as postgresql-simple currently has. Personally, I'm not opposed to name shadowing, and pretty strongly disagree with people who are strongly opposed to name shadowing; I think it could be pretty idiomatic to write withTransaction c $ \c -> .... Upon exit, withTransaction would mark c' as "transaction finished", which would conceptually be very similar to a "closed" connection, and then release the original MVar lock so that c becomes usable again.

The nice thing about this is you could even use c' from multiple threads, in the odd case that might actually make sense. And by tweaking the conceptual implementation a bit more, you could also solve some of the issues brought up in #179, namely, the possibility having withTransaction (and possibly several variants) deal with "nested" transactions more gracefully, either by throwing an exception, ignoring the nested transaction, or using savepoints to emulate nested transactions. It might be useful to do some isolation level checking in some of these variants, as well.

lpsmith avatar Dec 30 '17 03:12 lpsmith

Hmm. I quite like name shadowing but in the case that somebody chooses not to name shadow, doesn't that open up a not-very-subtle possibility of deadlocking? E.g.

withTransaction c $ \c' -> do
  query c' ...
  execute c ... -- whoops! darn it i hate when i leave my 0x27s at home ...

So I wonder if there is a neat way of using a phantom type / changing the execution context of withTransaction to reflect that, e.g.

withTransaction :: Connection Unlocked -> (Connection LockedDownForBusiness -> IO a) -> IO a

and then requiring two versions of every query/execute function -

-- (I'm sure the same effect could be achieved with typeclasses) 
queryUnlocked :: Connection Unlocked -> q -> IO stuff
queryLocked :: Connection LockedDownForBusiness -> q -> IO stuff

Although now that I think through my suggestion I'm not really sure what it buys over the 'dumb' way either calling withTransaction (\conn -> query conn...) or withConnectionAtomically (\conn -> query conn ...) where withConnectionAtomically has the same concurrency behavior as withTransaction it just doesn't create a transaction PG-side. If a user really wanted to use the connection from multiple threads, variants like queryButItsNotThreadSafe could be provided which use tryReadMVar/tryTakeMVar under the hood.

charles-cooper avatar Dec 30 '17 03:12 charles-cooper

Yep, you are absolutely correct that this sort of mistake would pretty easily lead to deadlock, and I suspect that (without the use of name shadowing) this would be a fairly common sort of mistake.

(Personally, I try to use name shadowing intentionally, and somewhat idiomatically, when it makes sense. I certainly am opposed to the undisciplined use of name shadowing. Using it to document/ensure that a given variable should not be referred to in a given block is an example of when I use name shadowing, which this is a good example of when I'd shadow. But I digress.)

On the other hand, is this (somewhat advanced) issue really worth polluting a (much more basic) interface, and imposing intellectual overhead upon more novice users? I mean, everything else being equal, absolute safety (in the Rust sense of safety) is certainly preferable to provisional safety (i.e. "safety" in languages other than Rust, and to a lesser extent languages other than Haskell, ML, Scheme, Lisp, etc.) However, achieving absolute safety can be difficult and "expensive", whereas provisional safety can at times be a reasonable fallback.

(As another digression, this is very similar to Haskell's let x = x + 1 let-scoping issue; I made this sort of mistake fairly often during my first 5-10 years of Haskell programming, and it was often hard to spot, whereas these days I rarely make this sort of mistake, and if I do, it's relatively easy to spot.)

I guess in the end, I can agree that the current semantics of withTransaction needs to be fixed, whereas I don't know if this newer problem my proposed fix would introduce is really worth fretting over at the present time.

lpsmith avatar Dec 30 '17 04:12 lpsmith

Generally speaking, sharing the same Connection across multiple threads should not the recommended approach. The recommended approach should be to use a connection-pool with each thread getting a new connection.

For more advanced use-cases, connection-sharing, across threads can be added in a cookbook, of sorts.

The semantics of withTransaction, and withConnection should be clarified in the docs, along with an example of using connection pooling.

saurabhnanda avatar Jan 02 '18 04:01 saurabhnanda

Generally speaking, sharing the same Connection across multiple threads should not the recommended approach. The recommended approach should be to use a connection-pool with each thread getting a new connection.

That statement isn't overly self-consistent. In that case, you are sharing connections across threads, you are just using the connection pool to mediate the concurrency issues. And in the common sort of use case of a website back end, it's often neither necessary nor desirable to create a new connection for each thread (which is typically created to serve one request), or to reserve a single connection for the duration of the entire request.

(e.g. see withPG and withTransaction in snaplet-postgresql-simple; otherwise every database action in a single thread, can potentially go out on a different connection, and single connections can bounce around between threads.)

Now, while I agree that using a single connection from multiple threads without any other sort of concurrency mediation is a bit unusual, but it is occasionally useful, especially if you aren't dealing with something that isn't (directly) an http backend. And the postgresql-simple philosophy is to generally avoid imposing opinions, but rather allow the widest possible range of use cases.

lpsmith avatar Jan 02 '18 22:01 lpsmith

That statement isn't overly self-consistent

Allow me to rephrase. Sharing a connection concurrently across threads should not be recommended. If sharing connections across threads in a non-concurrent manner is required (most common use-case), then connection pooling is a tried and tested solution, and it should be recommended.

... but it is occasionally useful... ... philosophy is to generally avoid imposing opinions...

"make simple things simple, complex things possible"

saurabhnanda avatar Jan 03 '18 01:01 saurabhnanda

I ran into a issue probably related to this. I'm using postgres-simple together with beam-postgres in a http server. When serving concurrent requests I was some times getting errors that looked like this (with different temp numbers):

SqlError {sqlState = "34000", sqlExecStatus = FatalError, sqlErrorMsg = "cursor \"temp20\" does not exist", sqlErrorDetail = "", sqlErrorHint = ""}

As a novice in this space it was not obvious to me that I couldn't just create a connection when starting the http server and share it between requests. Creating a new connection for each request solved the issue.

I think a warning should be added that it is not safe to use a connection concurrently.

reite avatar Feb 25 '18 05:02 reite

But it is safe to use a connection concurrently; there just isn't any provision at the present time in cases when you want to issue multiple queries atomically (as is involved with things such as e.g. fold and withTransaction) without going back to the end of the line.

In your case, you probably want to investigate connection pooling, either via pgbouncer (which would change the deployment of your code, but not the code itself) or tweak your code to add pooling internally to your application.

lpsmith avatar Feb 25 '18 23:02 lpsmith