go-capnp
go-capnp copied to clipboard
Documentation for server.Policy does not agree with implementation
The documentation for server.Policy
indicates that when the limits are exceeded calls will immediately return exceptions. However, when pairing to debug #189, @lthibault and I discovered this is not actually what happens. Instead, the receive loop just blocks until an item in the queue clears.
However, I do not actually think what the documentation describes is what we want; this would mean lost messages if the sender is sending too fast, and it seems like it would be challenging to program around. We really want some kind of backpressure like that provided by the C++ implementation's streaming support.
I propose the following roadmap for dealing with memory/backpressure instead:
- Short term, fix the docs to describe what the code actually does.
- Slightly longer term, get rid of the bounded queue and replace it with an unbounded one.
- Remove the
server.Policy
type entirely. - Implement something like the C++ implementation's per connection flow limit, for some soft backpressure at the connection level. Probably as a field on
rpc.Options
- Implement per-object flow control as is done in the C++ implementation for streaming specifically. I think we can do this for all clients irrespective of the streaming annotation, without changing any APIs; we just need to block the caller in appropriate places.
- In
rpc.Options
, add a field for a hard limit on memory consumed by outstanding rpc messages (of all types, including returns), after which the connection will just be dropped.
Thoughts?
@zombiezen, particularly since I'm proposing removing something you added for v3, I would appreciate your thoughts.
The documentation for
server.Policy
indicates that when the limits are exceeded calls will immediately return exceptions. [...] Instead, the receive loop just blocks until an item in the queue clears.
Yeah, looks like I introduced that in 27748fa54354994b866f4fb57f050e2bad34df8f without updating docs. Sorry.
However, I do not actually think what the documentation describes is what we want; this would mean lost messages if the sender is sending too fast, and it seems like it would be challenging to program around. We really want some kind of backpressure like that provided by the C++ implementation's streaming support.
I agree, server.Policy
seems like a kludge and it would be a better developer experience to have backpressure than to have a capability reject method calls. My memory is hazy here on details, but it seems like I might have realized the mistake and was trying to move more toward backpressure.
I did spill a lot of words at the time I was crunching through this that might help explain: https://groups.google.com/g/capnproto/c/A4LiXXd1t94/m/GkFH1AK-AgAJ
The fundamental insight that I have is that Cap'n Proto RPC is a work plan stream across all capabilities. You necessarily need to queue under the assumption traffic will be bursty, but there are situations that can deadlock if not enough work resolves in time.
IIUC, if there's protocol-level support for applying backpressure to individual objects while preserving E-Order, that would be better than the state of the world when I was trying to hack in server.Policy
.
An unbounded queue sounds potentially dangerous, though, but I think you're addressing that with the rpc.Options
memory limit? That does match with my insight above.
An unbounded queue sounds potentially dangerous, though, but I think you're addressing that with the rpc.Options memory limit? That does match with my insight above.
Right, the idea is that rather than relying on a queue bound for memory limiting, we'd track it manually. But we would expect under normal circumstances we would never reach the memory limit; it's not for backpressure, it's a stopgap for abuse, and we just drop the connection if it is breached. We rely on cooperative mechanisms for "normal" backpressure. See also #198, where I started working on per-object backpressure.
I vaguely remember the mailing list thread. See also https://groups.google.com/g/capnproto/c/wbGvhHaBan4 which from a couple weeks ago.
Cool. Sounds like the right direction; let me know if there's anything else I can help with.
Everything on my roadmap above is done except for (1) the connection-level limits and (2) adaptive flow control (which to be fair the C++ implementation doesn't really have anyway; we already have a non-adaptive version). (2) is in progress as #294. I'm going to open a separate issue for the connection limits and close this one.