awx icon indicating copy to clipboard operation
awx copied to clipboard

Set CONN_MAX_AGE to better integrate with databases that time out idle connections

Open kdelee opened this issue 2 years ago • 4 comments

Please confirm the following

  • [X] I agree to follow this project's code of conduct.
  • [X] I have checked the current issues for duplicates.
  • [X] I understand that AWX is open source software provided for free and that I might not receive a timely response.

Feature type

Enhancement to Existing Feature

Feature Summary

I've had a few questions posed to me lately about how long we keep database connections open. I am still learning about the motivations behind the questions, but I think they stem from the fact that some managed postgres services on cloud providers set timeouts in postgres such as https://www.postgresql.org/docs/14/runtime-config-client.html#GUC-IDLE-SESSION-TIMEOUT or people who use services like PGBouncer have settings that sever idle connections.

We addressed a huge source of idle database connections thanks to https://github.com/ansible/awx/pull/11745 However, we still have a few other sources of persistent connections that can sit idle for indefinite periods of time.

These are:

  • the callback receiver has persistent database connections that it may recover if they get broken, but in general never closes them.
  • dispatch workers doing tasks other than running jobs leave connections open, @AlanCoding mentioned "dispatcher tasks close connections when they start, not when they finish. This seemed wrong to me, but I have not fixed it."

We may consider setting CONN_MAX_AGE https://docs.djangoproject.com/en/4.0/ref/databases/#persistent-connections to take advantage of Django's own logic for killing/restablishing database connections.

Select the relevant components

  • [ ] UI
  • [X] API
  • [ ] Docs
  • [ ] Collection
  • [ ] CLI
  • [ ] Other

Steps to reproduce

We can examine how long we are keeping database connections open:

 SELECT 
    pidplication_name
    ,datnamehostname
    ,usenameport
    ,application_name
    ,client_hostname
    ,client_port
    ,backend_startity
    ,query_start
    ,query  e = 'awx' ORDER BY backend_start;
FROM pg_stat_activity
WHERE usename = 'awx' ORDER BY backend_start;

Current results

right now you would see connnections from the callback receivers hanging out for basically forever, and a dispatcher with idle workers would also keep database connections open after doing work, but if they never pick up anything new they just keep the old connection.

Sugested feature result

Recycle connections after a configurable amount of time by setting CONN_MAX_AGE

this relates to other work that could happen to in general make controller more reselient to losing a connection and re-establishing it.

Additional information

No response

kdelee avatar Jan 27 '23 20:01 kdelee

I am using 23.7.0 with 4/3 task/web setup and using external backend with 200 Max connections. I see a lot of idle wsrelay-awx-tasks connections close to 80-90% of 200 max connections at peak and they seem to timeout and the number drops to 30-40% level of max connections. Same queries for all of them: SELECT "conf_setting"."id", "conf_setting"."created", "conf_setting"."modified", "conf_setting"."key", "conf_setting"."value", "conf_setting"."user_id" FROM "conf_setting" WHERE ("conf_setting"."key" = 'LOG_AGGREGATOR_LOGGERS' AND "conf_setting"."user_id" IS NULL) ORDER BY "conf_setting"."id" ASC LIMIT 1 I tried using CONN_MAX_AGE on the client side and this had no effect. Is there normal behavior we expect to have?

mimianddaniel avatar May 09 '24 22:05 mimianddaniel

I can confirm this same query SELECT "conf_setting"."id", "conf_setting"."created", "conf_setting"."modified", "conf_setting"."key", "conf_setting"."value", "conf_setting"."user_id" FROM "conf_setting" WHERE ("conf_setting"."key" = 'LOG_AGGREGATOR_ENABLED' AND "conf_setting"."user_id" IS NULL) ORDER BY "conf_setting"."id" ASC LIMIT 1 is also not terminated on AWX 24.6.1 and PostgreSQL 15 external Database.

martin-micimo avatar Oct 18 '24 07:10 martin-micimo