node scale the socket receive buffer on the fly #39092

As an experiment, I tried scaling the receive buffer on the fly depending on how full it was on the last iteration It has the advantage of using less memory for slow streams and being more efficient for faster streams

It does not allow for manual control (but this can probably be added) and it effectively short-circuits the back pressure mechanism - as the first read always gets through - which is not very well suited for stream originating from the network

Jun 20 '21 09:06 mmomtchev

Sorry, I meant to refer to #39092

Jun 20 '21 09:06 mmomtchev

Can't the onread feature of net.Socket already achieve the same goal?

Jun 20 '21 10:06 mscdex

Can't the onread feature of net.Socket already achieve the same goal?

Not if you are piping (unless this is implementing in the piping too)

Jun 20 '21 13:06 mmomtchev

Not if you are piping (unless this is implementing in the piping too)

I don't follow. If you had socket.push(data) inside the onread callback and had the buffer as a function returning Buffer instances of varying sizes to provide the scaling, would that not allow the socket to be piped like normal?

Jun 20 '21 16:06 mscdex

Not if you are piping (unless this is implementing in the piping too)

I don't follow. If you had socket.push(data) inside the onread callback and had the buffer as a function returning Buffer instances of varying sizes to provide the scaling, would that not allow the socket to be piped like normal?

The way I understand this part of the code, is that this buffer is static and it is reused for every write? Which is a very good idea performance-wise, but I don't think you can make it work with the current piping implementation?

Jun 21 '21 08:06 mmomtchev

The way I understand this part of the code, is that this buffer is static and it is reused for every write?

Not necessarily, the buffer property can also be a function that returns any Buffer that should be used. So it could return a new Buffer every time.

Jun 21 '21 10:06 mscdex

The way I understand this part of the code, is that this buffer is static and it is reused for every write?

Not necessarily, the buffer property can also be a function that returns any Buffer that should be used. So it could return a new Buffer every time.

Yes, I agree, I had missed the part in stream_base_commons.js which reallocates the buffer on every read. This would leave it to the user to scale the buffer? Or do you propose that the same mechanism is implemented in the JS-part of the library?

Jun 21 '21 11:06 mmomtchev

This would leave it to the user to scale the buffer? Or do you propose that the same mechanism is implemented in the JS-part of the library?

I would lean more towards the former unless there is strong evidence with data to back it up that this scaling behavior should existing in node core by default.

Jun 21 '21 14:06 mscdex

This would leave it to the user to scale the buffer? Or do you propose that the same mechanism is implemented in the JS-part of the library?

I would lean more towards the former unless there is strong evidence with data to back it up that this scaling behavior should existing in node core by default.

This is subject of discussion of course. My rationale is that currently if you have a 20MB/s input stream (my example), this would require 320 event loop iterations to process. An application might have only 10% overall CPU load - if it comes at chunks of 5ms of processing time each, it won't be able to keep up. And the end result won't be a 10MB/s input stream - it would be a constantly stalling input stream - one that sets its TCP window to zero 10 times per second. In my case the 20MB/s stream was going at about 10KB/s. It is an absolutely horrible behavior. But in fact, you don't need to change anything in Node to avoid it - setting the highWaterMark of the socket solves the problem - it is just that is not a very well documented parameter (in fact, if you are not aware of Node's internals there is no way you can find it yourself).

Making an explicit note in the documentation is one solution
Exposing this parameter in the http.request is another
Asking the user to manually scale the buffer works too
But this solution is automatic and it also has better performance for all high-bandwidth incoming streams, even if the receiving application is not doing anything but waiting for the download to complete
And there is the simple fact that the backpressure mechanics work very well for pipes where Node controls both ends, but are simply ill-suited for streams coming from the network where Node has no control over the sender

Jun 21 '21 14:06 mmomtchev

The net benchmarks look fine to me. I wonder if this will have a noticeable impact on memory usage one way or the other in practice?

Jun 21 '21 21:06 mscdex

The net benchmarks look fine to me. I wonder if this will have a noticeable impact on memory usage one way or the other in practice?

It uses less memory for slow connections - and when it comes to fast connections don't forget that the kernel too allocates up to 256Kb per successful TCP connection (ie after the SYN exchange) - so normally, if it worked before, it should continue to work

I will be on solar power for some time, so I can't build Node from here to test it

I made a draft PR because there a few points that need discussing:

Is scaling back really needed?
- I had horrible yoyo effects when I tried scaling back at 1/2
Should we break bad applications that are used to receiving 64k per read

Jun 23 '21 22:06 mmomtchev

Should we break bad applications that are used to receiving 64k per read

Seeing as how we're dealing with streams here, nobody should be relying on chunk sizes.

Jun 24 '21 01:06 mscdex

What's the status here? If we still want to land this, it should be rebased on top of main.

Sep 20 '23 08:09 aduh95