snowflake-connector-nodejs icon indicating copy to clipboard operation
snowflake-connector-nodejs copied to clipboard

SNOW-853951: Configurable `highWaterMark` for streaming

Open APTy opened this issue 2 years ago • 9 comments
trafficstars

Would there be a way to set the highWaterMark for streams? The improvements in #43 are great, but the default of 10 feels a bit low, and would be nice to be able to tweak that value a bit to find something that works for each use case to get optimal throughput.

APTy avatar Jun 29 '23 15:06 APTy

hi and thank you for raising this issue! rowStreamHighWaterMark seems to be there for this, could you please try setting it in Connection and see if it works for you ? example:

var connection = snowflake.createConnection({
    account: account,
    username: user,
..
    rowStreamHighWaterMark = 20 // default=10
  });

sfc-gh-dszmolka avatar Jun 30 '23 05:06 sfc-gh-dszmolka

Thanks for the quick response! In my experience, it doesn't seem that this parameter is actually configurable. This is confirmed by @sfc-gh-ext-simba-lf here in #505.

I'd be happy to open a PR to add external: true to this param so that users can configure it, if that's something the team wants to expose to users.

APTy avatar Jun 30 '23 13:06 APTy

ah, missed that - thank you for pointing it out. definitely something to discuss, i don't see a reason why we would not want to expose it, but i'm checking with the team.

edit: of course the PR is welcome but first I want to make sure we can/want actually expose it. missing the part why we didn't do so in the first place, maybe there was a reason I'm not aware.

sfc-gh-dszmolka avatar Jun 30 '23 13:06 sfc-gh-dszmolka

Appreciate it! If we indeed want to make it configurable, I guess there's a question of:

  • Should it be configured on the connection (createConnection()) or the session (conn.execute()) or the result stream (stmt.streamRows())?
  • Or something different entirely, like the CLIENT_RESULT_CHUNK_SIZE param?

(Yes - we may not want to actually expose it, but then it'd be good to know whether 10 is indeed an optimal value, or whether some performance is being left on the table because of it)

APTy avatar Jun 30 '23 13:06 APTy

Hey @sfc-gh-dszmolka thanks for the help so far, just checking whether there's been any movement internally on this one yet 😄

APTy avatar Jul 12 '23 11:07 APTy

hi @APTy at this moment nothing to share but will keep this updated once i hear any interesting back. We're also considering a more systematic approach of reviewing all the currently internal parameters to see if more of them could be made external, as some of them seem useful to have control over for the user. Thank you for bearing with us !

sfc-gh-dszmolka avatar Jul 12 '23 11:07 sfc-gh-dszmolka

Any update on this?

The current lack of configuration on streams makes simpler code suffer in performance - and we end up doing manual batching using the start + end params instead (so that everything comes into memory at once).

Getting more control over the streaming parameters would allows to write much simpler code such as:

let batch = [];
await (for row of snowflake.streamRows()) {
   batch.push(row);
   if (batch.length === batchSize) {
       // do something
       batch = [];
   }
}

if (batch.length > 0) {
   //doSomething
}

But right now we can't use this pattern because the performance is too slow.

doom-weaver avatar Feb 10 '24 15:02 doom-weaver

At this moment I don't have any updates on this, but will keep this thread posted as soon as there's any. Thank you for bearing with us !

sfc-gh-dszmolka avatar Feb 12 '24 04:02 sfc-gh-dszmolka

After syncing internally with the team, it looks like now this is planned to be finished by end of Q1 (April 2024) but likely sooner than that.

sfc-gh-dszmolka avatar Feb 13 '24 08:02 sfc-gh-dszmolka