pgsync
pgsync copied to clipboard
Increasing sync speed
PGSync version: 2.1.10 Postgres version: 13 Elasticsearch version: 7.10.2 Redis version: 6.2.6 Python version: 3.9.8 Problem Description: Bulk inserts into the database take a considerable amount of time to be processed by pgsync. If I insert 65k records into postgres, pgsync takes ~1 minute to fully sync all those changes. I've set up a separate listener script and was able to verify postgres pushes out all the notify events in <2 seconds.
Are there any ways to optimize this or increase the throughput of the sync? Using POLL_TIMEOUT
didn't seem to increase speeds very much. From my research, I'm thinking it's the push to redis here that could potentially be slowing things down.
I was able to verify that on the initial sync, if the data already exists in the database, pgsync seeds elasticsearch in <5 seconds.
I haven't seen the write to Redis being slow.
Also what is the structure of your database?
Do you have lots of nested tables.
are your services on different networks?
Can you run pgsync with -a option?
pgsync -c path/to/schema -a
I see you have an MR to adress this. I will also take a look at this.
I've tried to make a simple reproduction repo here to reproduce the results https://github.com/JacobReynolds/PGSyncSpeedTest. You can ignore the containerized pgsync stuff if you'd like to run it locally instead, you'll see greater performance improvements outside docker.
In case this is of interest, I have added partial async support in the master branch. You can enable this with the environment variable.
PS: this feature is still experimental ATM.
USE_ASYNC=True