node-etl Missing implementation of `maxBuffer` in Postgres

@willfarrell @ZJONSSON

The documentation states

# etl.postgres.script(pool, schema, table [,options])

Collects data and builds up a postgres statement to insert/update data until the buffer is more than maxBuffer (customizable in options). Then the maxBuffer is reached, a full sql statement is pushed downstream. When the input stream has ended, any remaining sql statement buffer will be flushed as well.

However the code of Postgres is missing the maxBuffer implementation.

Apr 12 '18 06:04 dennismphil

Thanks @dennismphil it would make sense to add this. We've had good experience with the mysql implementation as we can better calibrate the ETL load with maxBuffer (instead of number of records).

Do you want to take a stab at this in a PR?

Apr 13 '18 00:04 ZJONSSON

Yes I will take a stab at this. Please assign to me

Apr 13 '18 01:04 dennismphil

@ZJONSSON Taking a look at the current code, wondering if we should bring in a dependency something like Squel.js for a better readable code? Thoughts?

Apr 13 '18 03:04 dennismphil

I generally like to avoid adding dependencies unless there is a good reason. Do you have an example of how the code above would look with Squel.js ?

Apr 15 '18 21:04 ZJONSSON

I agree that we should limit adding in dependencies. If it cannot be avoided, I would recommend knex over squel. Mostly because it supports more database engines and has a larger community.

Apr 16 '18 02:04 willfarrell

We could use Knex. I'll rewrite the existing code using knex in one file and let's see if you like the structure better. Will let @ZJONSSON / @willfarrell to veto it out if it does not look like it adds more clarity.

Apr 16 '18 16:04 dennismphil

I am not liking knex after working for a while for all the workarounds to make it use as a query builder. Will drop the library and use plain string concatenation itself. WIll see how to organize it better.

Apr 26 '18 01:04 dennismphil

Hi, I was reading about this topic. Would be interested to see this implemented wrt Postgres. How are things going?

As far as I am understanding, can you confirm that the change would look like the following?:

from: INSERT INTO tmp(col_a,col_b) VALUES('a1','b1') INSERT INTO tmp(col_a,col_b) VALUES,('a2','b2') INSERT INTO tmp(col_a,col_b) VALUES,('a3','b3')

to (as long as MaxBuffer is not exceeded): INSERT INTO tmp(col_a,col_b) VALUES('a1','b1'),('a2','b2'),('a3','b3')

Pros: faster performance

Sep 19 '18 19:09 rbreejen

Yes that's basically what the maxBuffer is for - i.e. to capture as many values as possible until a certain maximum and then send off the query

Sep 19 '18 20:09 ZJONSSON

node-etl node-etl copied to clipboard

Missing implementation of `maxBuffer` in Postgres

node-etl
node-etl copied to clipboard