nodes
nodes copied to clipboard
Debug failed RDS connections
We see these failed connections to RDS in the app sometimes, need to figure out what's causing them:
Some semi qualified initial observations:
- looks aurora serverless rds should allow for ~190 connections per ACU, so we should have a base headroom of about 400 connections (source)
- prisma defaults to a pool size of
num_physical_cpus * 2 + 1(source) - checked
os.cpus()on a random desci-server pod, returns 4 logical cores. This could mean prisma defaults to a pool size of 9. Potentially overkill as we have a resource limit of 1 cpu on the pod, but I'm not sure if this limits us to 1 core/2 threads. - across all envs, we have 24 instances of desci-server => 216 open connections just for the main backend service
We should:
- check the rds console for actual stats on connections
- investigate potential errors on the rds side
- see if we can adjust
max_connectionsto fit our idle pool size - see if we can lower the pool size on the desci-server nodes if the autodetect doesn't work like it should
- most importantly, implement connection retrial where missing
i think we solved with connection pool params