nodes icon indicating copy to clipboard operation
nodes copied to clipboard

Debug failed RDS connections

Open m0ar opened this issue 1 year ago • 1 comments

We see these failed connections to RDS in the app sometimes, need to figure out what's causing them:

image.png

m0ar avatar Oct 09 '24 12:10 m0ar

Some semi qualified initial observations:

  • looks aurora serverless rds should allow for ~190 connections per ACU, so we should have a base headroom of about 400 connections (source)
  • prisma defaults to a pool size of num_physical_cpus * 2 + 1 (source)
  • checked os.cpus() on a random desci-server pod, returns 4 logical cores. This could mean prisma defaults to a pool size of 9. Potentially overkill as we have a resource limit of 1 cpu on the pod, but I'm not sure if this limits us to 1 core/2 threads.
  • across all envs, we have 24 instances of desci-server => 216 open connections just for the main backend service

We should:

  1. check the rds console for actual stats on connections
  2. investigate potential errors on the rds side
  3. see if we can adjust max_connections to fit our idle pool size
  4. see if we can lower the pool size on the desci-server nodes if the autodetect doesn't work like it should
  5. most importantly, implement connection retrial where missing

m0ar avatar Oct 09 '24 13:10 m0ar

i think we solved with connection pool params

hubsmoke avatar May 15 '25 00:05 hubsmoke