postgres Document / adjust defaults for possible `connection (already) closed` issue when used on Swarm (due to IPVS)

Document / adjust defaults for possible `connection (already) closed` issue when used on Swarm (due to IPVS)

Open u1735067 opened this issue 1 year ago • 1 comments

The issue https://github.com/docker-library/postgres/issues/538 introduced a warning in the README about the Docker Swarm IPVS LB that timeouts TCP connections after 900 secs, which is lower than tcp_keepalive_time, so idle connections might become unavailable (cut by IPVS) while still being needed later, and that warning have since been removed (https://github.com/docker-library/docs/commit/5e28015ab2d9039a28daca5f7d65be996eb39234), probably due to the link being broken.

Could it be possible to reintroduce a warning about this, and/or to propose a default -ctcp_keepalives_idle=870 value (for example; and maybe tcp_keepalives_interval+tcp_keepalives_count)?

The old "success" documentation is visible at https://web.archive.org/web/20200611114911/https://success.docker.com/article/ipvs-connection-timeout-issue.

Possible solutions are:

connect to postgres without the LB using tasks.<service_name>
- client must resolve on each connection in case the IP changed
use endpoint_mode: dnsrr to prevent using the LB
- client must resolve on each connection in case the IP changed
use net.ipv4.tcp_keepalive_time: 870 (<900)
use -ctcp_keepalives_idle=870 (<900)

Example of all solutions (only one needed):

services:
  postgres:
    command:
      - postgres
      - -ctcp_keepalives_idle=300  # < 900
      # Maybe this too ?
      # - -ctcp_keepalives_interval=30
      # - -ctcp_keepalives_count=5
    sysctls:
      net.ipv4.tcp_keepalive_time: 720  # < 900
    deploy:
      endpoint_mode: dnsrr  # The client should resolve on each connection in case the task (IP) changed

  my_other_service:
    environment:
      POSTGRES_HOST: tasks.postgres  # The client should resolve on each connection in case the task (IP) changed

Jul 17 '23 12:07 u1735067

Sorry for the delay! :sob:

Unfortunately, I think tuning PostgreSQL for use within Swarm is probably out of scope for this repository. :see_no_evil: :disappointed:

Maybe we can add back a really small blurb about the problem in the docs, perhaps using this issue as our link instead of that old success article?

Dec 14 '23 01:12 tianon

postgres postgres copied to clipboard

Document / adjust defaults for possible `connection (already) closed` issue when used on Swarm (due to IPVS)

postgres
postgres copied to clipboard