beam-nuggets icon indicating copy to clipboard operation
beam-nuggets copied to clipboard

Performance issue with the write operations

Open angoenka opened this issue 4 years ago • 6 comments

The write operation creates a connection per bundle which can very small for streaming causing a flood of short lived connections and without connection pooling, this could become a performance bottleneck. Also, having a batch write option would significantly reduce the number of writes.

angoenka avatar Mar 05 '21 22:03 angoenka

+1

I have a suggested pull request I can add here in a little bit.

The idea is to emulate how we write in batches with the JdbcIO connector for Java.

We can call .write_record() without doing the .commit() every time. We then move the .commit() to the .finishBundle() and then a little bit of logic for committing batches for N amount of records to prevent the commit from getting too large.

chishankar-work avatar Mar 09 '21 19:03 chishankar-work

Additionally, opening and creating a new connection is a an expensive operation from the server side. This stackexchange post talks more about why. Moving the connections to a static pool allows the reuse of connections, which is best practice for SQL servers.

Added advantages of writing in batches, after a GBK, allows you to keep all transactions for the affected record in one place. This drastically reduces the amount of thread contention in the SQL server as threads (each handling a connection and a transaction) won't be waiting to access the same row. Lastly it also allows the users to take advantage of server optimizations since we submit it all as one transaction.

chishankar-work avatar Mar 12 '21 00:03 chishankar-work

Thanks @chishankar-work; I'll have a look soon at your PR.

mohaseeb avatar Mar 21 '21 20:03 mohaseeb

In which state is this improvement? I'm worried that all the single inserts will hammer very hard the PostgreSQL database that I have.

It's a bit tricky in my opinion when I want a custom UPSERT and, at the same time, doing it in batch mode.

@chishankar-work will your PR support this feature?

manuelnucci avatar Jun 06 '21 14:06 manuelnucci

@mohaseeb It seems that the original author dropped this feature. I'm happy to prepare a new PR.

medzin avatar Nov 03 '22 20:11 medzin

Did this enhancement ever get merged in?

StevieBurnsSky avatar Aug 02 '23 08:08 StevieBurnsSky