tower Buffer's queue does not remove canceled requests

Due to Buffer using a futures::sync::mpsc channel, any ResponseFutures that have been dropped will continue to consume space in the queue until the underlying service has progressed through the requests in front of them. A Buffer could be wrapped in Timeout, which could cancel the requests if waiting took too long. The oneshot::Sender still being somewhere in the queue means the buffer's capacity could become full of canceled requests.

There's the additional issue that a wrapped Reconnect may wish to only retry a failed connect if there are still response futures waiting, but being in the queue makes it impossible to determine that.

This is kind of the "other" half of a pool. hyper does have a queue of waiters internally, and can check when they are canceled, since they are actually in a VecDeque. To allow new requests to enter this queue, it's wrapped in an Arc<Mutex>. While perhaps not the best thing in the world, it does work, and people still get excellent performance from hyper's client, so we could consider that as a first pass.

Related Conduit issue: https://github.com/runconduit/conduit/issues/899

May 10 '18 01:05 seanmonstar

I believe that this describes two separate issues.

Only reconnecting when there is a request to send.
The inability to purge canceled requests from the queue.

`Reconnect`

Thinking more about it. I am leaning towards refining Service::poll_ready with the following:

An error is terminal. The service will never work again.
The fn should only be called when there is a request to send. This means that if the request "goes away", poll_ready should not be called anymore.

This means that, if there are no requests in the buffer, poll_ready would not be called and the connection would not be established.

The reason that I like this is because it models what would happen if you did a select over the request being canceled and the service's ready future. It also encapsulates the poll_ready concept and avoids leaking out of the abstraction.

The problem is that the connection logic might end up being half way there (socket connected and TLS handshake in progress), then poll_ready is no longer called and the handshake never finishes. Either this is OK (the remote will timeout) or there there could be a task spawned to drive the connection to completion regardless if poll_ready is called. I would also say that this problem exists today w/ the current Service API.

`Buffer`

The second problem is the queue used by Buffer does not eagerly release resources for canceled requests. I'm not exactly sure what you are proposing.

There are ways to deal with it w/o going w/ a Mutex<Vec<_>>, but of course, we shouldn't over optimize for perf w/o numbers. Is the queue holding on to memory a real issue today? I would think it isn't a ton of memory. It is also worth noting that the mpsc queue algorithm can support iteration on the consumer side too w/o a ton of trouble.

May 10 '18 18:05 carllerche

This morning after thinking about it some more, I started to lean more in this direction:

Reconnect: In the proxy, we can treat the error from Reconnect::poll_ready as terminal, just as you mentioned. We would log that it happened, and then create a fresh Reconnect service.
Buffer: The memory waste itself isn't the larger issue, just something I noticed while reading through the source. However, as long as we can't iterate the waiters, we don't know if we should keep polling the inner Service::poll_ready.

Trying to combine the two needs a bit of a dance. So that the inner service in the proxy doesn't keep creating new Reconnect services and polling them, and trashing on error over and over, we would only want to poll the new Reconnect if there is still a request waiting. However, we don't really know if there is, and so what do we return from poll_ready?

If Buffer were to iterate its waiters in poll_ready, and pop any canceled, then we could potentially just task::current().notify() in the inner service, and let the Buffer only poll it again if there are still requests waiting.

May 10 '18 19:05 seanmonstar

Ah, sorry, there was another bit that I forgot to say in my previous comment.

The Buffer task implementation would always pop one request from the queue, then start calling poll_ready on the inner service. It would also call poll_cancel (or whatever it is) on the response oneshot. This way, it knows if the request is canceled. If the request is canceled, it pops another request from the queue. If all requests are canceled, this will effectively drain the queue.

May 10 '18 19:05 carllerche

Yes, that would help if there is a general timeout applied to all requests, since it's unlikely that a request at the front of the queue is not timed out, while another further back is.

However, it doesn't take into consideration if a request is canceled for another purpose, such as in the proxy the server connection could be closed (since we coalesce requests to the same target from different connections), or we could have gotten a RST_STREAM frame for it.

May 10 '18 19:05 seanmonstar

@seanmonstar has this since been fixed? if not, could you re-iterate the issues after #72 landed?

Mar 31 '20 22:03 jonhoo