grpc-node
grpc-node copied to clipboard
streaming flow control
Forgive me if I've overlooked documentation on this subject, but I'm trying to understand how to properly configure flow control on streaming interfaces. Usually with Node streams I believe you would set the highWatermark option, however it appears that gRPC does not accept that option.
I thought maybe there was a grpc specific configuration that could be set, but after looking through the available options, it wasn't very clear to me if there was and how to use it correctly.
Essentially the issue I have is that I want to temporarily stop buffering results, typically in Node you would use the pause() and resume() methods to do this, however even if a stream is paused its internal buffer continues to get filled up and if I'm unable to configure the size of said buffer, I start to see increased memory usage.
Ping @murgatroid99
We do not currently expose an API to change the highWaterMark option, and we don't change it ourselves, and I agree that we should do so. And I didn't realize this until just now, but the default value is actually very high for object mode streams, so it would probably be good for us to choose a different default. Probably something very small, like 1 or 5.
Actually, I'm looking at the documentation again and I see that the default high water mark for object mode is only 16, which seems pretty reasonable to me. There might be a different explanation for the high memory usage.
I think regardless of whether or not that's the case, flow control is definitely something I would like to see implemented in gRPC. It's a pretty basic feature of Node streams in general, so I would think its something users would expect to have.
@murgatroid99 after doing some testing, I'm at a bit of a loss on how to proceed or even describe my issue. I can replicate what users are seeing with the increased memory usage, however for me it only occurs when I have a large volume of data waiting to be sent to a worker. I thought applying the highWatermark option would resolve said issue, but testing locally has me thinking otherwise.
It seems like there needs to be some kind of mechanism that tells gRPC to stop making requests altogether, but I don't believe that exists (not in the Node version at least?) and I'm not really sure how something like that fits in with a Node streaming interface. Part of me wonders if this is even the correct place to try and resolve such an issue.
I'm not sure what you mean by a "mechanism that tells gRPC to stop making requests." You have to explicitly initiate each request, so that's a decision that would be made in a layer above gRPC.
You mention having "a large volume of data waiting to be sent to the worker." Does that mean that you have a lot of objects loaded into memory, and if so, is that not enough to account for the high memory usage?
I'm not sure what you mean by a "mechanism that tells gRPC to stop making requests." You have to explicitly initiate each request, so that's a decision that would be made in a layer above gRPC.
Sorry, I'm talking about a bidi stream. In the case of PubSub, its possible that they will send a continuous stream of large amounts of data to us, but as far as I can tell there is not way to throttle that.
You mention having "a large volume of data waiting to be sent to the worker." Does that mean that you have a lot of objects loaded into memory, and if so, is that not enough to account for the high memory usage?
Yeah, that's definitely what's happening, but as I said before, I think my inability to throttle the data is the real issue.
Oh, you're talking about being on the receiving side of a stream. The stream has a pause() method, which should stop it from outputting data events, and if everything works properly, once the buffer fills up it should stop receiving messages. Then you can resume with the resume() function.
Sure, but that doesn't help me in regards to memory usage, because the stream will continue to receive data from the server.
What I'm saying is that the stream will stop receiving data from the server. Once the buffer fills up, it stops requesting messages from the gRPC C core, which will in turn stop refilling the HTTP2 flow control window, which will eventually result in the server not sending any more messages until the receiving side calls resume() and the buffers start draining. The memory usage there is limited to the total size of the messages that can fit in the read buffer plus the size of the flow control window (I don't remember the default for that, but I think it is configurable).
I see. Was there any plans to allow the internal buffer size to be configurable? Also is there any documentation on adjusting the flow control window?
The flow control window might actually not be configurable. I'm not sure how to do it. But the buffer I'm talking about is just the stream buffer that is configured by the highWatermark.
Has that been made configurable? When you mentioned it 2 weeks ago, it was not.
No, we haven't changed that yet. I'm just pointing out that we're still talking about that same buffer.
Hello. 👊 Bumping this. @murgatroid99 flow control configuration would be nice to have 🙏.