akvo-lumen
akvo-lumen copied to clipboard
Postgres runs out connections
Context
When opening https://ci-fiji.akvolumen.org/ after logging in, the call to /api/env
was failing with a 500 error leaving the frontend blank.
Problem or idea
stack
11:14:12.660 [XNIO-1 task-14] ERROR akvo.lumen.auth.api-authorization - #error {
:cause FATAL: remaining connection slots are reserved for non-replication superuser connections
:via
[{:type com.zaxxer.hikari.pool.HikariPool$PoolInitializationException
:message Failed to initialize pool: FATAL: remaining connection slots are reserved for non-replication superuser connections
:at [com.zaxxer.hikari.pool.HikariPool throwPoolInitializationException HikariPool.java 576]}
{:type org.postgresql.util.PSQLException
:message FATAL: remaining connection slots are reserved for non-replication superuser connections
:at [org.postgresql.core.v3.QueryExecutorImpl receiveErrorResponse QueryExecutorImpl.java 2440]}]
:trace
[[org.postgresql.core.v3.QueryExecutorImpl receiveErrorResponse QueryExecutorImpl.java 2440]
[org.postgresql.core.v3.QueryExecutorImpl readStartupMessages QueryExecutorImpl.java 2559]
[org.postgresql.core.v3.QueryExecutorImpl <init> QueryExecutorImpl.java 133]
[org.postgresql.core.v3.ConnectionFactoryImpl openConnectionImpl ConnectionFactoryImpl.java 250]
[org.postgresql.core.ConnectionFactory openConnection ConnectionFactory.java 49]
[org.postgresql.jdbc.PgConnection <init> PgConnection.java 195]
[org.postgresql.Driver makeConnection Driver.java 454]
[org.postgresql.Driver connect Driver.java 256]
[com.zaxxer.hikari.util.DriverDataSource getConnection DriverDataSource.java 119]
[com.zaxxer.hikari.pool.PoolBase newConnection PoolBase.java 369]
[com.zaxxer.hikari.pool.PoolBase newPoolEntry PoolBase.java 198]
[com.zaxxer.hikari.pool.HikariPool createPoolEntry HikariPool.java 467]
[com.zaxxer.hikari.pool.HikariPool checkFailFast HikariPool.java 541]
[com.zaxxer.hikari.pool.HikariPool <init> HikariPool.java 115]
[com.zaxxer.hikari.HikariDataSource <init> HikariDataSource.java 81]
[akvo.lumen.component.tenant_manager$pool invokeStatic tenant_manager.clj 61]
[akvo.lumen.component.tenant_manager$pool invoke tenant_manager.clj 49]
[akvo.lumen.component.tenant_manager$load_tenant$fn__19185 invoke tenant_manager.clj 77]
[clojure.lang.Delay deref Delay.java 42]
[clojure.core$deref invokeStatic core.clj 2320]
[clojure.core$deref invoke core.clj 2306]
[akvo.lumen.component.tenant_manager.TenantManager connection tenant_manager.clj 89]
[akvo.lumen.endpoint.env$handler$fn__27115 invoke env.clj 25]
[akvo.lumen.component.tenant_manager$wrap_label_tenant$fn__19176 invoke tenant_manager.clj 42]
[iapetos.collector.ring$run_instrumented invokeStatic ring.clj 127]
[iapetos.collector.ring$run_instrumented invoke ring.clj 123]
[iapetos.collector.ring$wrap_instrumentation$fn__5520 invoke ring.clj 163]
[iapetos.collector.ring$wrap_metrics_expose$fn__5529 invoke ring.clj 184]
[akvo.lumen.lib.auth$wrap_auth_datasets$fn__26715$fn__26717 invoke auth.clj 193]
[akvo.lumen.auth.api_authorization$api_authorization invokeStatic api_authorization.clj 30]
[akvo.lumen.auth.api_authorization$api_authorization invoke api_authorization.clj 19]
[akvo.lumen.auth$authorize invokeStatic auth.clj 30]
[akvo.lumen.auth$authorize invoke auth.clj 28]
[akvo.lumen.util$as_middleware$fn__11350$fn__11351 invoke util.clj 99]
[akvo.lumen.auth.jwt_authentication$jwt_authentication invokeStatic jwt_authentication.clj 22]
[akvo.lumen.auth.jwt_authentication$jwt_authentication invoke jwt_authentication.clj 15]
[akvo.lumen.util$as_middleware$fn__11350$fn__11351 invoke util.clj 99]
[akvo.lumen.component.handler$fn__17214$fn__17216$fn__17217 invoke handler.clj 114]
[akvo.lumen.component.handler$fn__17232$fn__17233$fn__17234 invoke handler.clj 125]
[ring.middleware.json$wrap_json_response$fn__16849 invoke json.clj 87]
[ring.middleware.json$wrap_json_body$fn__16824 invoke json.clj 46]
[ring.middleware.keyword_params$wrap_keyword_params$fn__16351 invoke keyword_params.clj 53]
[ring.middleware.params$wrap_params$fn__16639 invoke params.clj 67]
[ring.middleware.absolute_redirects$wrap_absolute_redirects$fn__16754 invoke absolute_redirects.clj 47]
[ring.middleware.content_type$wrap_content_type$fn__16718 invoke content_type.clj 34]
[ring.middleware.default_charset$wrap_default_charset$fn__16734 invoke default_charset.clj 31]
[ring.middleware.not_modified$wrap_not_modified$fn__16707 invoke not_modified.clj 61]
[raven_clj.ring$wrap_sentry$fn__4618 invoke ring.clj 22]
[akvo.lumen.component.handler$fn__17192$fn__17194$fn__17195 invoke handler.clj 98]
[akvo.lumen.component.tenant_manager$wrap_label_tenant$fn__19176 invoke tenant_manager.clj 42]
[reitit.ring$ring_handler$fn__15884 invoke ring.cljc 286]
[clojure.lang.AFn applyToHelper AFn.java 154]
[clojure.lang.AFn applyTo AFn.java 144]
[clojure.lang.AFunction$1 doInvoke AFunction.java 31]
[clojure.lang.RestFn invoke RestFn.java 408]
[immutant.web.internal.undertow$create_http_handler$reify__19026 handleRequest undertow.clj 239]
[org.projectodd.wunderboss.web.undertow.async.websocket.UndertowWebsocket$2 handleRequest UndertowWebsocket.java 109]
[io.undertow.server.session.SessionAttachmentHandler handleRequest SessionAttachmentHandler.java 68]
[io.undertow.server.Connectors executeRootHandler Connectors.java 211]
[io.undertow.server.HttpServerExchange$1 run HttpServerExchange.java 809]
[java.util.concurrent.ThreadPoolExecutor runWorker nil -1]
[java.util.concurrent.ThreadPoolExecutor$Worker run nil -1]
[java.lang.Thread run nil -1]]}
11:14:12.662 [XNIO-1 task-14] ERROR akvo.lumen.component.handler - 500 App Error
current open connections
postgres=> select max_conn,used,res_for_super,max_conn-used-res_for_super
res_for_normal
from
(select count(*) used from pg_stat_activity) t1,
(select setting::int res_for_super from pg_settings where
name='superuser_reserved_connections') t2,
(select setting::int max_conn from pg_settings where name='max_connections') t3
;
max_conn | used | res_for_super | res_for_normal
----------+------+---------------+----------------
100 | 42 | 3 | 55
(1 row)
Solution or next step
Probably reduce the idle time for connections in a pool.
Given the number of available connections (55 as seen above), it shouldn't have failed, but
remaining connection slots are reserved for non-replication superuser connections
does indicate something else. Not sure how to ensure this doesn't happen again