rsocket-java icon indicating copy to clipboard operation
rsocket-java copied to clipboard

RSocket should not prioritize 0 frames

Open mostroverkhov opened this issue 4 years ago • 5 comments

When outgoing messages queue grows, responses are likely to timeout before respective request frame is even sent - causing unnecessary work on responder side. New requests have no chance to progress when RSocket falls into this state. Before https://github.com/rsocket/rsocket-java/pull/718 this was detected by missing keep-alives as they were sent on same queue in order. Now keep-alives are prioritized - ignore outgoing queue size on both client and server, leaving RSocket in odd state until connection is closed.

Same is with Lease frames: here prioritization undermines the whole idea of concurrency limiting because peer is granted with new request permits even though receiver is overwhelmed with existing outgoing messages - It does not make sense to send lease before queue is drained.

mostroverkhov avatar Mar 01 '20 21:03 mostroverkhov

This issue was initially discussed here -> https://github.com/rsocket/rsocket/pull/280

Apart, I'm in the middle of saying yes, it makes sense because of the current implementation on the one hand, and the other hand, every logical stream can timeout its requests, so in theory, all associated messages should be dropped (e.g. FluxFlatMap).

In any case, we have proper non-prioritized message delivering on the layer-4 network level, so if we have any issue with messages' consumptions, the timeout will be fired immediately.

Also, the presence of such cases when Queue can be overwhelmed was the main reason to initiate changes to the leasing spec since leasing has to prevent such cases.

I would rather say that right now, possible issues with the cases you mentioned are the downsides of the current implementation. However, the main problem that many users experience because of the current implementation is related to undelivered keep-alives, which leads to its disabling at all. (You may imagine that elements are coming, the stream is working but then, surprisingly, you are getting keep-alive timeout which should not happen if the elements are being consumed. The expected keep-alive timeout is the absence of all the messages).

The conclusion was that we have to send keepalives while the connection is available. Once we got the issue on delivering frames on the level of connection messaging - then we have to terminate execution.

The implementation would be different if we have a proper processor to control backpressure on the connection level, but for now, at least, there is no way to control back pressure with regards to network demand, so the best will be delivering zero stream messages with priority

OlegDokuka avatar Mar 01 '20 22:03 OlegDokuka

some related discussion is on https://github.com/rsocket/rsocket-java/pull/718#issuecomment-593626600

every logical stream can timeout its requests, so in theory, all associated messages should be dropped

Messages are added to sendProcessor so they are hitting network, never dropped.

When queue grows on Requester, what you have is Request frame followed by Cancel frame (due to timeout) before said request even hits network. Here Responder receives and processes request, but Requester not interested in response anymore.

When queue grows on Responder, response will be dropped on Requester side (stream removed because of timeout).

In this state new requests cant progress, still RSocket is not closed because keep-alive rtt is 1ms (but request rtt is 1 sec).

main reason to initiate changes to the leasing spec since leasing has to prevent such cases

It does prevent such cases - as soon as Lease frames are sent in order. As soon as peer Requester current lease expires, no new requests will hit this Responder until frames in front of Lease frame are sent.

mostroverkhov avatar Mar 03 '20 00:03 mostroverkhov

Messages are added to sendProcessor so they are hitting the network, never dropped.

As I said earlier that is the downside of the existing implementation. That will be revised in the future versions. That problem is going to be solved by replacing UnboundedQueue as a sendProcessor with FlatMapping processor with a tiny queue per request-stream request-channel

When queue grows on Requester, what you have is Request frame followed by Cancel frame (due to timeout) before said request even hits network. Here Responder receives and processes request, but Requester not interested in response anymore.

This is the application-level concern on how to manage requests' queue. Facebook solves that transforming queue into the stack (@yschimke correct me if I'm wrong)

In this state new requests cant progress, still RSocket is not closed because keep-alive rtt is 1ms (but request rtt is 1 sec).

This makes zero sense to close connection because of that. This is application concerns and rsocket is just a mechanism for Layer 5 Layer 6 messages transmitting. The application developers should decide on how they want to manage the cases where there are many messages enqueued. Usually, that is solved by proper rate-limiting.

Application engineers can easily perform request-response in order to detect latency of queuing and dequeuing so using that info they can tune the data flow. If keepalive works as an application-level mechanism to detect a queue it will be challenging to detect on which level we have problems (imagine your application fails with keep-alive so in order to detect what is the root cause you would need to expect connection layer + rsocket layer + application layer

cc/ @rstoyanchev @stevegury @linux-china @nebhale

OlegDokuka avatar Mar 03 '20 11:03 OlegDokuka

"Messages are added to sendProcessor so they are hitting network, never dropped." This existing implementation has a serious OOM issue as described https://github.com/reactor/reactor-netty/issues/881

There needs to be some way to drop messages from the requester's queue when the responder is overwhelmed. Please consider this in the future versions.

For the RSocket version we use internally, I've applied a patch to drop new messages when netty's buffer (set to 16MB) is full. It's not ideal, but at least we do not run the risk of OOM on the requester's side due to slow responders.

xiazuojie avatar Mar 11 '20 04:03 xiazuojie

@xiazuojie

There needs to be some way to drop messages from the requester's queue when the responder is overwhelmed. Please consider this in future versions.

I'm working on FlatMappingProcessor which will have proper backpressure support and every long-running stream will have its own small queue for prefetch. That should solve the problem (in theory)

OlegDokuka avatar Mar 11 '20 08:03 OlegDokuka