Stream large dataset from crate
Is there any method to stream a huge table into nodejs process using node-crate?
We also tried pg module with no success!
Hi, did you try copy from in crate console?
https://crate.io/docs/crate/guide/en/latest/best-practices/data-import.html#importing-data-using-copy-from
What is your problem with huge dataset?
Did you try executeBulk. See code example in the test:
https://github.com/megastef/node-crate/blob/2874aceabae6c2faa982c29e5d2553b861f55c0f/test/test.js#L152-L178
I think executeBulk, works well for a few hundred items per bulk request. Make sure that you don't run too many concurrent bulk requests, as Java has a limited number of http threads. So it is better to wait until a bulk is finished and then insert the next set of records.
I want to SELECT from a 10 million record TABLE and move that data into multiple nodejs processes @megastef
I don’t see a stream option in the http API endpoint. The ‘select’ statement has the options ‘limit’ and ‘offset’ - so you could load data in chunks e.g using ‘limit=500’ and increase the offset by 500 in each query until you get less than 500 records. https://crate.io/docs/crate/reference/en/latest/sql/statements/select.html#offset
This logic would be useful in node-crate, i could imagine to wrap it in a stream reader interface.
Note that fetching large result sets from Crate can produce a high load and disk IO. Your use case might be a good case for Apache Kafka or other message queues.
@behrad did you solve your problem? Could you contribute a function to wraps multiple selects with limit/offset into a stream readable interface?