postgres
postgres copied to clipboard
Document / adjust defaults for possible `connection (already) closed` issue when used on Swarm (due to IPVS)
The issue https://github.com/docker-library/postgres/issues/538 introduced a warning in the README about the Docker Swarm IPVS LB that timeouts TCP connections after 900 secs, which is lower than tcp_keepalive_time
, so idle connections might become unavailable (cut by IPVS
) while still being needed later, and that warning have since been removed (https://github.com/docker-library/docs/commit/5e28015ab2d9039a28daca5f7d65be996eb39234), probably due to the link being broken.
Could it be possible to reintroduce a warning about this, and/or to propose a default -ctcp_keepalives_idle=870
value (for example; and maybe tcp_keepalives_interval
+tcp_keepalives_count
)?
The old "success" documentation is visible at https://web.archive.org/web/20200611114911/https://success.docker.com/article/ipvs-connection-timeout-issue.
Possible solutions are:
- connect to postgres without the LB using
tasks.<service_name>
- client must resolve on each connection in case the IP changed
- use
endpoint_mode: dnsrr
to prevent using the LB- client must resolve on each connection in case the IP changed
- use
net.ipv4.tcp_keepalive_time: 870
(<900
) - use
-ctcp_keepalives_idle=870
(<900
)
Example of all solutions (only one needed):
services:
postgres:
command:
- postgres
- -ctcp_keepalives_idle=300 # < 900
# Maybe this too ?
# - -ctcp_keepalives_interval=30
# - -ctcp_keepalives_count=5
sysctls:
net.ipv4.tcp_keepalive_time: 720 # < 900
deploy:
endpoint_mode: dnsrr # The client should resolve on each connection in case the task (IP) changed
my_other_service:
environment:
POSTGRES_HOST: tasks.postgres # The client should resolve on each connection in case the task (IP) changed
Sorry for the delay! :sob:
Unfortunately, I think tuning PostgreSQL for use within Swarm is probably out of scope for this repository. :see_no_evil: :disappointed:
Maybe we can add back a really small blurb about the problem in the docs, perhaps using this issue as our link instead of that old success article?