node-etl icon indicating copy to clipboard operation
node-etl copied to clipboard

Missing implementation of `maxBuffer` in Postgres

Open dennismphil opened this issue 7 years ago • 9 comments

@willfarrell @ZJONSSON

The documentation states

# etl.postgres.script(pool, schema, table [,options])

Collects data and builds up a postgres statement to insert/update data until the buffer is more than maxBuffer (customizable in options). Then the maxBuffer is reached, a full sql statement is pushed downstream. When the input stream has ended, any remaining sql statement buffer will be flushed as well.

However the code of Postgres is missing the maxBuffer implementation.

dennismphil avatar Apr 12 '18 06:04 dennismphil

Thanks @dennismphil it would make sense to add this. We've had good experience with the mysql implementation as we can better calibrate the ETL load with maxBuffer (instead of number of records).

Do you want to take a stab at this in a PR?

ZJONSSON avatar Apr 13 '18 00:04 ZJONSSON

Yes I will take a stab at this. Please assign to me

dennismphil avatar Apr 13 '18 01:04 dennismphil

@ZJONSSON Taking a look at the current code, wondering if we should bring in a dependency something like Squel.js for a better readable code? Thoughts?

dennismphil avatar Apr 13 '18 03:04 dennismphil

I generally like to avoid adding dependencies unless there is a good reason. Do you have an example of how the code above would look with Squel.js ?

ZJONSSON avatar Apr 15 '18 21:04 ZJONSSON

I agree that we should limit adding in dependencies. If it cannot be avoided, I would recommend knex over squel. Mostly because it supports more database engines and has a larger community.

willfarrell avatar Apr 16 '18 02:04 willfarrell

We could use Knex. I'll rewrite the existing code using knex in one file and let's see if you like the structure better. Will let @ZJONSSON / @willfarrell to veto it out if it does not look like it adds more clarity.

dennismphil avatar Apr 16 '18 16:04 dennismphil

I am not liking knex after working for a while for all the workarounds to make it use as a query builder. Will drop the library and use plain string concatenation itself. WIll see how to organize it better.

dennismphil avatar Apr 26 '18 01:04 dennismphil

Hi, I was reading about this topic. Would be interested to see this implemented wrt Postgres. How are things going?

As far as I am understanding, can you confirm that the change would look like the following?:

from: INSERT INTO tmp(col_a,col_b) VALUES('a1','b1') INSERT INTO tmp(col_a,col_b) VALUES,('a2','b2') INSERT INTO tmp(col_a,col_b) VALUES,('a3','b3')

to (as long as MaxBuffer is not exceeded): INSERT INTO tmp(col_a,col_b) VALUES('a1','b1'),('a2','b2'),('a3','b3')

Pros: faster performance

rbreejen avatar Sep 19 '18 19:09 rbreejen

Yes that's basically what the maxBuffer is for - i.e. to capture as many values as possible until a certain maximum and then send off the query

ZJONSSON avatar Sep 19 '18 20:09 ZJONSSON