snowflake-connector-nodejs
snowflake-connector-nodejs copied to clipboard
SNOW-853951: Configurable `highWaterMark` for streaming
Would there be a way to set the highWaterMark for streams? The improvements in #43 are great, but the default of 10 feels a bit low, and would be nice to be able to tweak that value a bit to find something that works for each use case to get optimal throughput.
hi and thank you for raising this issue! rowStreamHighWaterMark seems to be there for this, could you please try setting it in Connection and see if it works for you ? example:
var connection = snowflake.createConnection({
account: account,
username: user,
..
rowStreamHighWaterMark = 20 // default=10
});
Thanks for the quick response! In my experience, it doesn't seem that this parameter is actually configurable. This is confirmed by @sfc-gh-ext-simba-lf here in #505.
I'd be happy to open a PR to add external: true to this param so that users can configure it, if that's something the team wants to expose to users.
ah, missed that - thank you for pointing it out. definitely something to discuss, i don't see a reason why we would not want to expose it, but i'm checking with the team.
edit: of course the PR is welcome but first I want to make sure we can/want actually expose it. missing the part why we didn't do so in the first place, maybe there was a reason I'm not aware.
Appreciate it! If we indeed want to make it configurable, I guess there's a question of:
- Should it be configured on the connection (
createConnection()) or the session (conn.execute()) or the result stream (stmt.streamRows())? - Or something different entirely, like the
CLIENT_RESULT_CHUNK_SIZEparam?
(Yes - we may not want to actually expose it, but then it'd be good to know whether 10 is indeed an optimal value, or whether some performance is being left on the table because of it)
Hey @sfc-gh-dszmolka thanks for the help so far, just checking whether there's been any movement internally on this one yet 😄
hi @APTy at this moment nothing to share but will keep this updated once i hear any interesting back. We're also considering a more systematic approach of reviewing all the currently internal parameters to see if more of them could be made external, as some of them seem useful to have control over for the user. Thank you for bearing with us !
Any update on this?
The current lack of configuration on streams makes simpler code suffer in performance - and we end up doing manual batching using the start + end params instead (so that everything comes into memory at once).
Getting more control over the streaming parameters would allows to write much simpler code such as:
let batch = [];
await (for row of snowflake.streamRows()) {
batch.push(row);
if (batch.length === batchSize) {
// do something
batch = [];
}
}
if (batch.length > 0) {
//doSomething
}
But right now we can't use this pattern because the performance is too slow.
At this moment I don't have any updates on this, but will keep this thread posted as soon as there's any. Thank you for bearing with us !
After syncing internally with the team, it looks like now this is planned to be finished by end of Q1 (April 2024) but likely sooner than that.