pgsync icon indicating copy to clipboard operation
pgsync copied to clipboard

Increasing sync speed

Open JacobReynolds opened this issue 3 years ago • 3 comments

PGSync version: 2.1.10 Postgres version: 13 Elasticsearch version: 7.10.2 Redis version: 6.2.6 Python version: 3.9.8 Problem Description: Bulk inserts into the database take a considerable amount of time to be processed by pgsync. If I insert 65k records into postgres, pgsync takes ~1 minute to fully sync all those changes. I've set up a separate listener script and was able to verify postgres pushes out all the notify events in <2 seconds.

Are there any ways to optimize this or increase the throughput of the sync? Using POLL_TIMEOUT didn't seem to increase speeds very much. From my research, I'm thinking it's the push to redis here that could potentially be slowing things down.

I was able to verify that on the initial sync, if the data already exists in the database, pgsync seeds elasticsearch in <5 seconds.

JacobReynolds avatar Jan 23 '22 18:01 JacobReynolds

I haven't seen the write to Redis being slow. Also what is the structure of your database? Do you have lots of nested tables. are your services on different networks? Can you run pgsync with -a option? pgsync -c path/to/schema -a I see you have an MR to adress this. I will also take a look at this.

toluaina avatar Jan 25 '22 08:01 toluaina

I've tried to make a simple reproduction repo here to reproduce the results https://github.com/JacobReynolds/PGSyncSpeedTest. You can ignore the containerized pgsync stuff if you'd like to run it locally instead, you'll see greater performance improvements outside docker.

JacobReynolds avatar Jan 25 '22 15:01 JacobReynolds

In case this is of interest, I have added partial async support in the master branch. You can enable this with the environment variable.

PS: this feature is still experimental ATM.

USE_ASYNC=True

toluaina avatar May 23 '22 21:05 toluaina