X-Road icon indicating copy to clipboard operation
X-Road copied to clipboard

X-Road Proxy fails to restore DB Connection

Open autero1 opened this issue 2 years ago • 1 comments

Version: 7.0.1 Our external PostgreSQL database (AWS RDS Aurora) occasionally reboots / is restored and as a result, the X-Road Proxy enters into permanent fail state. Only way to recover is to restart the service.

Are there any configuration options we could tune to have the services automatically restore the DB connections?

image

autero1 avatar Jul 05 '22 07:07 autero1

Hey @autero1. We will look into this matter.

vellotis avatar Jul 05 '22 07:07 vellotis

@autero1 Do you know for how long usually DB is unavailable? Short timeouts to DB should be fully recoverable. By default transactions wait for 30seconds (which is also visible in provided logs) before being killed. My guess that during these outages high transaction count might overload connection pool and eventually application locks up.

To verify this you can monitor hikariCP pool stats. To enable it please add <logger name="com.zaxxer.hikari" level="TRACE" /> <logger name="com.zaxxer.hikari.HikariConfig" level="DEBUG" /> to /etc/xroad/conf.d/proxy-logback.xml Logs will look like this: Pool stats (total=20, active=0, idle=20, waiting=0)

Playing with timeout configuration and Increasing pool size might help, but it would only solve this if transaction are consumed faster than new ones are created.

Note: Was not able to reproduce this by using security server sidecar and remote database. Tried killing it, restarting, stalling. Might be related to SS load.

ricardas-buc avatar Sep 09 '22 13:09 ricardas-buc

This mainly took place during nightly backups. I also think this could be related to #1293 . We haven't witnessed any issues since we tuned the message logging, so I think I'll just close this issue and reopen if something new surfaces.

autero1 avatar Sep 15 '22 12:09 autero1