pglogical icon indicating copy to clipboard operation
pglogical copied to clipboard

Pglogical, LOG: could not receive data from client: Connection reset by peer

Open phannyphok opened this issue 4 years ago • 9 comments

Dear All,

May I ask you about the pglogical on Postgresql 9.6? I got an issue as below:

LOG: could not receive data from client: Connection reset by peer LOG: unexpected EOF on standby connection

iwp_test1=# SELECT subscription_name, status FROM pglogical.show_subscription_status(); subscription_name | status -------------------+-------- subscription1 | down (1 row)

Could you help to find solution for this error?

Many thanks for your help.

Regards, Phanny

phannyphok avatar Dec 19 '19 05:12 phannyphok

I haven't seen this problem on pglogical, however, in my experience that error happens generically in postgres when there is a significant interruption of networking between the client and server. Is the error persistant in situation where you can make (and keep) a regular pgsql connection across same network?

dockpg avatar Dec 19 '19 12:12 dockpg

Is the error persistant in situation where you can make (and keep) a regular pgsql connection across same network?

I can use psql connection across network. The network is working normal. But the subscription is down.

For my problem case is that:

  • I have created subscription on slave, then I also created subscription on master to different table. After that, when I insert data into table (master), then the subscription on slave is going down (got error above). And the subscription on master is still replicating normal.

So my question is that, can we do that?Any suggestion please? Many thanks for your help.

phannyphok avatar Dec 20 '19 04:12 phannyphok

I am not entirely sure it's the same issue that we were experiencing, but you can check out this closed issue here. That's still not resolved for us, but the guy mentioned some package that did help him.

imadzharov88 avatar Jan 03 '20 08:01 imadzharov88

I can also reproduce this issue as well. For our case, we suspect something goes on AWS network issue. Although AWS deny (of course). Once we got this message, "EOF on stanby connection", the logical replication in slave is no longer communicate with primary.

Our postgres version and extention: primary db: 9.6.21, pglogical 2.2.2 slave db: 13.3, pglogical 2.3.3

On slave, there are only exit code 1 message observed but nothing more with any details. :LOG: background worker "pglogical apply xxx:xxx" (PID xxxx) exited with exit code 1

On primary, the subcriber automatically dropped from pg_stat_replication table.

I am feel like chasing a ghost.

hson-branch avatar Oct 08 '21 04:10 hson-branch

@hson-branch did you ever figure this out? I am seeing the same issue with RDS & AWS.

yannh avatar Oct 28 '22 11:10 yannh

+1 also running into same issue. we're on postgres 13.7 and pglogical 2.4.1.

we're in initializing state for a large table (3TB of data) and the connection stays alive for 1hr and then fails causing replication to go down

linnerissa avatar Nov 12 '22 08:11 linnerissa

I am running into this as well.. both the master and slave are on potgresql 10 and we are on aws rds with pglogical 2.4.1 There is nothing in the slave logs except following

2023-01-28 01:28:42 UTC:host(31560):pguser@db:[2495]:LOG: could not receive data from client: Connection reset by peer
2023-01-28 01:28:42 UTC:UTC:host(31560):pguser@db:[2495]:LOG: unexpected EOF on client connection with an open transaction
2023-01-28 01:28:42 UTC::@:[367]:LOG: worker process: pglogical apply 367023:976906607 (PID 2449) exited with exit code 1
2023-01-28 01:28:52 UTC::@:[370]:LOG: checkpoint starting: time
2023-01-28 01:28:56 UTC::@:[370]:LOG: checkpoint complete: wrote 40 buffers (0.0%); 0 WAL file(s) added, 0 removed, 1 recycled; write=3.943 s, sync=0.003 s, total=3.959 s; sync files=26, longest=0.003 s, average=0.001 s; distance=62 kB, estimate=62 kB

nitinsh99 avatar Jan 28 '23 01:01 nitinsh99

Facing similar errors - created a new issue https://github.com/2ndQuadrant/pglogical/issues/416

lpossamai avatar Feb 28 '23 01:02 lpossamai

@hson-branch did you ever figure this out? I am seeing the same issue with RDS & AWS.

Although this is not a long-term solution, I was able to overcome this issue for a few months. We ended up taking this route to migrate off of PG9.6 to PG13. When I had a problem when I sync a big wide table (approx. 40GB, up to 20 text or binary field columns). I had set up replication with pglogical from PG 9.6 to PG 13.2 (I believe pglogical 2.4.0).

  1. you must install the same version of pglogical: (if 2.4.0) CREATE EXTENSION pglogical WITH VERSION '2.4.0'; Reference for the version of the extension that RDS supported: https://docs.aws.amazon.com/AmazonRDS/latest/PostgreSQLReleaseNotes/postgresql-extensions.html

  2. check config of {MAX_WAL_SENDER, MAX_WORKER_PROCESSES, and, MAX_REPLICATION_SLOTS} This might sound like resource starvation on either or both source and destination, which ended killing or crashing processes silently. please refers to the GCloud SQL team's diagnosis of a replication issue on HERE

 max_worker_processes
  >= 1 + D + 8 (on the source instance)
  >= 1 + D + S + 8 (on the destination instance)
 max_wal_senders >= S + 2 (on the source instance)
 max_replication_slots >= S (on the source instance)
  1. If these settings require a dedicated parameter group (not default) setting and reboot required for AWS RDS

  2. if only single or multiple tables are stuck on sync. please remove the table and add it back and sync (https://github.com/2ndQuadrant/pglogical/issues/197)

 c=> select pglogical.replication_set_remove_table('default', 'table1');
 c=> select pglogical.replication_set_add_table('default', 'table1');
  1. if the previous step doesn't work, meaning replication is completely stuck. Drop the subscription and start from the scratch

Please feel free to ask me any questions

Thanks

hson-branch avatar Feb 28 '23 06:02 hson-branch