node-mysql2 icon indicating copy to clipboard operation
node-mysql2 copied to clipboard

Streaming interface of mysql2 does not work as expected

Open arupadhy opened this issue 4 years ago • 4 comments

Not sure if this is linked with https://github.com/sidorares/node-mysql2/issues/1168

I have noticed the back pressuring on streaming interface in mysql2 isn't being handled correctly (as expected from pipeline). Ideally, you expect your source stream to adjust based on speed of destination without any custom handling

Here is a sample code

const target = createWriteStream('./result-list.csv');

    pipeline(
       pool.query({ sql: GET_USERS }).stream({objectMode: true}),
       new Transform({
           objectMode: true,
           transform: (row, enc, cb) => {
               // async work here which can be slow/fast
               getActiveDevices(row.username_hash)
                   .then((data) => {
                       // formatting of data 
                       cb(null, [row.username_hash, data.count || 'None', data.activePlatformTypes].join(',')+'\r\n');
                   }).catch(err => {
                   cb(err);
               })
           }}),
       target,
       (err) => {
       if (err) {
           console.log('pipeline failed');
       } else {
           console.log('pipeline succeeded');
       }
       }
   );

I tried this with really large data set (million rows) and i can see indeterministic results with each run. I replaced 'mysql2' with 'mysql'...exact same code and now see back pressure being handled properly with deterministic result with each run.

arupadhy avatar Mar 16 '21 17:03 arupadhy

Hi @arupadhy , streaming mode was added years ago and is using very basic streams api, there is quite possibly a bug handling backpressure indeed

Would you be able to create simple repro repo? the example code is pretty good but still missing few bits. Also could you add the code you use to measure pressure. What do you mean by 'indeterministic' - is it crashing from time to time or the memory consumption varies a lot between runs?

sidorares avatar Mar 16 '21 21:03 sidorares

@sidorares i will try to setup a repo for you to be able to see the problem PS: not promising the time but soonish

arupadhy avatar Mar 17 '21 14:03 arupadhy

@sidorares give this a try https://github.com/arupadhy/large-sql-streaming I added some strawman steps in readme. Thanks for taking time to look into it

arupadhy avatar Mar 22 '21 16:03 arupadhy

I hate to revisit such an old issue but I believe I am seeing the same issue and wanted to confirm my thoughts.

When there is back pressure on the stream returned from Query.stream() the stream is "paused". Per BaseConnection.js when a stream is paused, packets from the underlying socket are queued and the code relies on the underlying TCP flow control to slow the flow of packets from the server. I assume there's some TCP level settings that can be toyed with to reduce the TCP flow and thus the memory utilization caused by queueing the TCP packets but will be case specific. Is this a correct assessment of what I am seeing? Are there any other knobs that can be turned within the library beyond the TCP level settings that are OS specific?

jej2003 avatar Mar 21 '25 18:03 jej2003