c3p0 icon indicating copy to clipboard operation
c3p0 copied to clipboard

Any error in Postgres transaction causes C3P0 to close connections

Open radaczynski opened this issue 10 years ago • 9 comments

Consider this postgres session

postgres=# create table temp (id int primary key);
CREATE TABLE
postgres=# begin
postgres-# ;
BEGIN
postgres=# insert into temp values(1);
INSERT 0 1
postgres=# insert into temp values(1);
ERROR:  duplicate key value violates unique constraint "temp_pkey"
DETAIL:  Key (id)=(1) already exists.
postgres=# select * from temp;
ERROR:  current transaction is aborted, commands ignored until end of transaction block
postgres=# select * from temp;
ERROR:  current transaction is aborted, commands ignored until end of transaction block
postgres=# 

This means that basically after any error a session is put in an aborted state and any query that is ran afterwards (other than ROLLBACK either complete or to a savepoint). The issue is that C3P0 in reponse to the error (e.g. contraint violation) goes into NewPooledConnection.handleThrowable, which goes on to test the state of the connection by... issuing a test query. This works on Oracle, but reports an error in postgres as demonstrated above, which causes the connection to be marked invalid. This is actually shown in the logs:

[17:06:36.170] DEBUG com.mchange.v2.c3p0.impl.DefaultConnectionTester  - Connection org.postgresql.jdbc4.Jdbc4Connection@5d4aa6eb failed Connection test with an Exception! [query=select 1] org.postgresql.util.PSQLException: ERROR: current transaction is aborted, commands ignored until end of transaction block
17:06:36.170] DEBUG com.mchange.v2.c3p0.impl.NewPooledConnection  - com.mchange.v2.c3p0.impl.NewPooledConnection@5596211f inva
lidated by Exception.
org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "foo"
[17:06:36.171] DEBUG com.mchange.v2.resourcepool.BasicResourcePool  - Excluded resource com.mchange.v2.c3p0.impl.NewPooledConnection@5596211f
[17:06:36.172] DEBUG com.mchange.v2.resourcepool.BasicResourcePool  - Preparing to destroy resource: com.mchange.v2.c3p0.impl.NewPooledConnection@5596211f
[17:06:36.172] DEBUG com.mchange.v2.c3p0.impl.NewPooledConnection  - com.mchange.v2.c3p0.impl.NewPooledConnection@5596211f closed by a client.
java.lang.Exception: DEBUG -- CLOSE BY CLIENT STACK TRACE

Note: I believe that this is postgres specific, I tested it on Oracle, and Oracle allows further queries to be executed after such a constraint violation.

Options for resolving:

  1. have a specialized ConnectionTester implementation for Postgres, which would either not test the connection at all after constraint violation, or do a rollback itself (VERY ugly and I would not recommend it)
  2. rethink the idea of checking connection status in response to any exception, maybe some subset of exceptions (like the constraint violations) do not require checking of the connection status?

radaczynski avatar Feb 20 '15 10:02 radaczynski

This postgres behaviour is documented here: http://www.postgresql.org/docs/9.3/static/tutorial-transactions.html

radaczynski avatar Feb 20 '15 10:02 radaczynski

The log you show is fine: c3p0 has excludes the Connection, it doesn't close it. "Excluding" means marking the Connection to be discarded rather than reintegrated into the pool upon check-in.

The client should still have access until she ends the session. It looks like what is going on is that the code handling the Exception also responds to it by close()ing the Connection. Then the pool, noticing that the Connection is excluded, properly destroys it.

Is the client code trying to continue to use the Connection, and unable to? That would be an issue. But if the client is close()ing the Connection, c3p0's policy of discarding and replacing Connections that fail a test strikes me as appropriately conservative.

swaldman avatar Feb 20 '15 10:02 swaldman

Thanks for the prompt reply! No, the client code actually issues a ROLLBACK and returns the connection.

Is there a way to cause reintegration of such connections? This particular one is perfectly fine to be used after the ROLLBACK, but will never be (as you stated).

I am not criticizing the policy of testing the connection, but pointing out that it does not work for postgres in particular.

radaczynski avatar Feb 20 '15 10:02 radaczynski

So it does work, even for postgres. The issue is that c3p0's policy is arguably too conservative. c3p0's policy of excluding apparently broken Connections creates some performance overhead in the recycling of a Connection that could be reused. The pool behaves correctly regardless, in terms of meeting its contract to users. On some margin, there is some unnecessary overhead that could be eliminated.

But how costly is this overhead? It's unfortunate that transaction failures due to contending transactions should cause c3p0 to recycle the Connection. They are an expected sort of failure. But these are hopefully infrequent, and the practical cost of a Connection reacquisition is low, if there is any margin of safety in pool size, i.e. unless your pool is barely large enough to satisfy its load.

I think the only way to address this that would not create new costs larger than the one it aims to mitigate would be to defer entirely testing Connections which experience an error until checkin, so that any recovery would prevent the test from ever failing. That wouldn't require DBMS-specific logic (which I really hope to avoid). It might well be a reasonable thing to do.

But I'm not sure that the gains, even for postgres users (which includes me!), are enough to make a very high priority of this. Has this been a practical problem for you, in the sense of having an application whose transactions often collide such that Connection churn has been a noticeable issue?

swaldman avatar Feb 20 '15 11:02 swaldman

Yes, I have been imprecise writing "it does not work", you're right. It does work, only causes connections churn :-).

I have only noticed this, because it has become a significant problem in a production application. I have some application code that was trying to make an insert in 10-milliseconds intervals for 15 seconds. When there was a constraint violation, this caused 1500 connections to be created and almost immediately destroyed.

This in turn caused a significant load on the database server (postgres creates a process for each connection), plus the TCP connections are kept in the TIME_WAIT state for a while after they are closed. This also causes extra load on the server.

One could argue that such a rapid pace of attempts to make an insert is perhaps not the best design, but still the overhead of this behavior is not negligible. I agree that it is not a generic use case, and probably not a lot of users are experiencing issues with this.

I understand the desire to keep the code generic as possible and avoid any vendor-specific code and fully support this.

radaczynski avatar Feb 20 '15 11:02 radaczynski

OK... So at least I know it's been a real problem for someone!

I'll look into marking rather than testing Connections on Exception, then forcing a test-on-checkin for marked Connections. (But, alas, the fix won't be so fast... that's a serious enough change that I'd put it in an 0.9.6 prerelease rather than in a 0.9.5.1 point release.)

swaldman avatar Feb 20 '15 11:02 swaldman

@swaldman : is there any hope for a fix?

radaczynski avatar Mar 08 '21 09:03 radaczynski

same question here

xrom888 avatar May 19 '21 14:05 xrom888

@swaldman was that fix included in any version after 0.9.5.1 ?

xrom888 avatar May 19 '21 14:05 xrom888

(nine years later...)

Done!

https://github.com/swaldman/c3p0/commit/b9c4ee30f57d7d47c1e1839e41ef824d7fd23c88

https://github.com/swaldman/c3p0/commit/5586c3cd1b0a63127ccbfe5baec29e8e848bb041

swaldman avatar Feb 29 '24 03:02 swaldman