node-crate icon indicating copy to clipboard operation
node-crate copied to clipboard

Stream large dataset from crate

Open behrad opened this issue 6 years ago • 4 comments

Is there any method to stream a huge table into nodejs process using node-crate? We also tried pg module with no success!

behrad avatar Dec 04 '19 14:12 behrad

Hi, did you try copy from in crate console? https://crate.io/docs/crate/guide/en/latest/best-practices/data-import.html#importing-data-using-copy-from

What is your problem with huge dataset? Did you try executeBulk. See code example in the test: https://github.com/megastef/node-crate/blob/2874aceabae6c2faa982c29e5d2553b861f55c0f/test/test.js#L152-L178

I think executeBulk, works well for a few hundred items per bulk request. Make sure that you don't run too many concurrent bulk requests, as Java has a limited number of http threads. So it is better to wait until a bulk is finished and then insert the next set of records.

megastef avatar Dec 04 '19 17:12 megastef

I want to SELECT from a 10 million record TABLE and move that data into multiple nodejs processes @megastef

behrad avatar Dec 04 '19 18:12 behrad

I don’t see a stream option in the http API endpoint. The ‘select’ statement has the options ‘limit’ and ‘offset’ - so you could load data in chunks e.g using ‘limit=500’ and increase the offset by 500 in each query until you get less than 500 records. https://crate.io/docs/crate/reference/en/latest/sql/statements/select.html#offset

This logic would be useful in node-crate, i could imagine to wrap it in a stream reader interface.

Note that fetching large result sets from Crate can produce a high load and disk IO. Your use case might be a good case for Apache Kafka or other message queues.

megastef avatar Dec 05 '19 05:12 megastef

@behrad did you solve your problem? Could you contribute a function to wraps multiple selects with limit/offset into a stream readable interface?

megastef avatar Dec 12 '19 10:12 megastef