HikariCP
HikariCP copied to clipboard
Connection refused after DB failover for 1 out of 2 connection pools
Hi, we use HikariCP and observed a weird behavior of our service after a PostgreSQL db failover was executed.
Some background information.
- The failover was executed by pointing the Elastic IP to the new primary database.
- The application has 2 connection pools to the same DB with different timeout values etc.
- We are using PostgreSQL 12
- HikariCP version 3.4.5
- PostgreSQL driver version 42.2.14
- Spring Boot 2.3.2 application
- Application is running on Kubernetes
Configuration of the datasources:
spring:
datasource:
default:
driver-class-name: org.postgresql.Driver
jdbc-url: jdbc:postgresql://localhost:5432/postgres
username: ${DB_USER:postgres}
password: ${DB_PASSWORD:postgres}
pool-name: default
connection-timeout: 5000
maximum-pool-size: 10
validation-timeout: 800
critical-query:
driver-class-name: ${spring.datasource.default.driver-class-name}
jdbc-url: ${spring.datasource.default.jdbc-url}
username: ${spring.datasource.default.username}
password: ${spring.datasource.default.password}
pool-name: critical-query
maximum-pool-size: 10
connection-timeout: 300
validation-timeout: 250
connection-init-sql: "set statement_timeout to '400ms'; set application_name to 'critical-ops';"
The problem occured on the default
datasource, but only for 1 out of 10 pods. And on this one pod only the default
connection pool was affected. The critical-query
connection pool recovered quickly after the failover. The problem was only fixed once the affected pod was manually terminated after ~30min.
We got the following exceptions:
org.springframework.web.util.NestedServletException: Request processing failed; nested exception is org.springframework.transaction.CannotCreateTransactionException: Could not open JDBC Connection for transaction; nested exception is java.sql.SQLTransientConnectionException: default - Connection is not available, request timed out after 5000ms.
...
Caused by: org.postgresql.util.PSQLException: Connection to {URL}:5432 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
...
Caused by: java.net.ConnectException: Connection refused (Connection refused)
As no DNS changes were involved, the jvm cache couldn't be a problem. Additionally I would expect that both connection pools would be affected in such a scenario. The same goes for any infrastructure related issue.
We suspect some kind of race condition or concurrency issue that leaves hikari cp in a bad state, because not all db failovers are affected. Any help would be appreciated. In case you need more information, please let me know.
@mschwitalla What is the maximum number of connections supported by postgres in 'Default'? You're saying that you got a connection refused to localhost? If the issue occurred on 'Default', and the URL in the configuration is localhost, then can I assume that {URL}:5432
in the exception is referring to localhost:5432
?
@brettwooldridge Sorry for the confusion. I pasted the jdbc-url from the default environment, because I didn't want to paste the production URL. Therefore it's not localhost, but a DB server used in production.
we are facing similar issues with Hikari Pool in Kubernetes Pods connectiong to Postgres databases in IBM Cloud. After a switch of the db, Hikari can not connect to database anymore with error message:
[...]
Caused by: org.postgresql.util.PSQLException: Connection to <DB-HOST>:<DB-PORT> refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:285) ~[postgresql-42.2.14.jar!/:42.2.14]
at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49) ~[postgresql-42.2.14.jar!/:42.2.14]
at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:217) ~[postgresql-42.2.14.jar!/:42.2.14]
[...]
@gutschet , the same issues. if you have resolved it, please give some idea. Thanks.
@qianliniu , we resolved the problem by setting a socket timeout for the connection pool against Postgres to 90 seconds. We found a lot information here: https://github.com/pgjdbc/pgjdbc/issues/1508
Since that change, our pool is able to reconnect to the database after a switch. But we still have tons of error logs, when the switch of the database happens and the pool loses connection. Not a clean solution, but it works.
@gutschet Thanks for your reply.
I have already add this setting the socketTimeout, but is 30 seconds.
String url = "jdbc:postgresql://" + (host + ":" + port + "/" + dbName + "?ssl=false&socketTimeout=30");
I don't think it will work, because in my case the db switch only cost about 20 seconds, but the db connect retried several hours, still the same error.
and I find this ticket, I'm not sure if it is the root cause(JVM DNS caching behavior) it is really hard to reappear the issue, I can not trigger the db switch directly, and it seems not every db switch will cause this issue. https://github.com/brettwooldridge/HikariCP/issues/1075
hi I see this issue on my project .And this issue not happend every time.I can see on load test with jmeter than we see this connection error.My system in aws fargate database in aws rds.I dont find any suitable solutions. My datasource configuration below.
spring: jpa: hibernate: ddl-auto: none properties: hibernate: check_nullability: true default_schema: users show-sql: false open-in-view: false datasource: hikari: connection-timeout: 60000 maximum-pool-size: 10 minimum-idle: 5 idle-timeout: 3000 pool-name: YeniPool jdbc-url: jdbc:postgresql://172.18.0.2:5432/memleket?currentSchema=users password: admin username: admin driver-class-name: org.postgresql.Driver max-lifetime: 3000000
I am seeing this issue even after 'evictConnections', [thread checks this every 30 seconds]
this log gets printed
writer ip changed, old is 10.0.133.*** and new is 10.0.29.*** Active connection count is : 0 and idle connection count is 2
private void checkForIpChange(HikariDataSource dataSource) {
try {
String ip = getIp(dataSource.getJdbcUrl());
if (Objects.isNull(dataSourceIpMap.get(dataSource))) {
dataSourceIpMap.put(dataSource, ip);
} else if (!StringUtils.equals(ip, dataSourceIpMap.get(dataSource))) {
log.info("writer ip changed, old is {} and new is {}", dataSourceIpMap.get(dataSource), ip);
evictConnections(dataSource);
dataSourceIpMap.put(dataSource, ip);
}
} catch (Exception e) {
log.error("Caught exception in cname check", e);
}
}
private void evictConnections(HikariDataSource dataSource) {
HikariPoolMXBean pool = dataSource.getHikariPoolMXBean();
if (pool != null) {
log.info("Active connection count is : {} and idle connection count is {}",
pool.getActiveConnections(), pool.getIdleConnections());
pool.softEvictConnections();
}
}
private String getIp(String url) throws URISyntaxException, UnknownHostException {
URI aURL = new URI(url);
aURL = new URI(aURL.getSchemeSpecificPart());
InetAddress address = InetAddress.getByName(aURL.getHost());
return address.getHostAddress();
}
@brettwooldridge any active work going on on this or any workarounds ?