okhttp icon indicating copy to clipboard operation
okhttp copied to clipboard

Duplex documentation is terribly wrong

Open martinandersson opened this issue 2 years ago • 6 comments

On this page, I read:

With regular HTTP calls the request always completes sending before the response may begin receiving.

Wrong. Where in any HTTP-related specification will you find a single word on timing? Or the definition of a "regular" request for that matter. The idea that HTTP is a synchronous one-to-one protocol is a complete fallacy.

RFC 7230 §5.6:

[HTTP] relies on the order of response arrival to correspond exactly to the order in which requests are made on the same connection.

Your page continues:

With duplex the request and response may be interleaved! That is, request body bytes may be sent after response headers or body bytes have been received.

Erm, there's one read/input stream and one write/output stream. They're like, separate. So the word "interleaved" is quite misleading.

only web servers that are specially designed for this nonstandard interaction will use [duplex-thing]

"Specially?" "Nonstandard interaction"? Gosh, to be real honest, I have a feeling the author of this page smoked crack or something. The client can pretty much count the number of requests made on a connection, the n:th response is the response to the n:th request - or, sync up and wait for both request/response to complete before initiating the next exchange over the same connection. It's that simple. And like I said, the request and response are transmitted on completely different streams.

The server can start writing a response back before receiving the full request and there's absolutely nothing "nonstandard" or weird about it. In fact, quite the expected case when it comes to many error codes, like 413 (Payload Too Large), or do you think the server should consume a 999 terabyte large request first before responding in a decade from now telling the client "yeah I was waiting for you to complete before letting you know I was consuming all bytes but didn't really care about them"?

I know a lot of servers out there are pretty shitty, hence why I rolled my own actually (called NoMagicHTTP). The server is fully asynchronous and what some kids would call "reactive". It exposes request bytes to application code immediately on arrival and will start writing response bytes back as soon as possible. There's like zero "waiting times" or hiccups with this server. Point is that it's well tested and I've never had any issues at all.

I don't even know what you are talking about. And how exactly did you plan on telling the server that the client is using/accepting "non-regular duplex-thing supported exchanges"? lol the whole idea is quite preposterous to be honest.

And so with that said, I can bet the implementation probably isn't working much better lol. I think that you should probably completely remove the idea of "duplex" from all documentation and likewise from the codebase, and write a correct implementation instead. Because it sounds like you had a bug of some sort but instead of reading specifications you stumbled into a black hole of misinformation?

EDIT: Here, I found a good SO answer that is probably way more pedagogic than what I am =) And, he uses your favorite "duplex" word lol.

martinandersson avatar Jan 30 '22 11:01 martinandersson

Separate from the technical merits, the tone of your message isn't really helpful here.

I have a feeling the author of this page smoked crack or something.

I can bet the implementation probably isn't working much better lol

Because it sounds like you had a bug of some sort but instead of reading specifications you stumbled into a black hole of misinformation?

Please tone it down and treat the author of this code with some respect.

yschimke avatar Jan 30 '22 13:01 yschimke

For the technical discussions, I think the code documentation you reference, generally describes the OkHttp implementation logic.

Having implemented a similar duplex implementation in another networking framework, similar to the gRPC usecase, my experience was that in almost all situations clients, server libraries and proxies (reverse proxies etc) fail to handle the duplex case cleanly. It needed specific client, server and knowledge of the path between.

I don't personally believe this is a priority right now because it's an uncommon use case. OkHttp doesn't seek to change the HTTP landscape by implementing an all powerful client for any and all use cases. The all powerful toolkit is much better served by frameworks like Netty that are awesome, and have all the parts for you to build a client that behaves exactly as you need. We can't beat Netty at it's own game. My own view is that OkHttp does best when it implements real world standards, simply and compatibly.

The OkHttpClient API and semantics mean that we do send the request before reading the body in most cases. This hasn't been a problem in practice for many years. But given https://github.com/square/okhttp/pull/6295, we already read the error response if they come because of the server terminating the request.

Is there a specific failure case you can show us to help us understand what is not working for you, and if there is a suitable fix (not likely an entire rewrite).

yschimke avatar Jan 30 '22 13:01 yschimke

Yeah I think your message is good but your delivery is needlessly condescending. It means my first instinct is to close & block.

Point taken that this isn't a limitation of the protocol; merely a limitation/feature of this particular client.

The on-the-wire behavior of isDuplex() is the same as without. By using this feature the caller is now responsible for processing two streams concurrently and not in serial.

It’s a common implementation strategy, required because HTTP insists servers send the status code before sending any response body bytes. Confirming that a request was processed successfully usually requires reading it to completion!

We're in good company with the terminology. gRPC docs and implementations use the word duplex to indicate that using the streams in serial is insufficient. https://grpc.io/docs/what-is-grpc/faq/#why-is-grpc-betterworse-than-rest

Next Steps

What we should describe is that in its regular operation, OkHttp doesn't start reading the response until it finishes writing the request.

We can improve the docs to emphasize the caller’s concurrency is what changes, not the protocol.

swankjesse avatar Jan 30 '22 13:01 swankjesse

Ah, I see what you are saying.

Firstly, thank you so much for your response. "Being on crack" is a humorous metaphor for what otherwise best can be described as slur:

Because the encoding of interleaved data is not well-defined for HTTP/1, duplex request bodies may only be used with HTTP/2.

Like, what! I mean I was having my morning coffee and almost choke lol. It's just that I was expecting so much more from a public HTTP library, and I am overall pretty much against spreading misinformation. A whole page not just with plain incorrect things but was also pretty slurry (I write slurry too!!, feel free to criticize and I will thank you for the time and energy spent). Anyways, ... I am really not the best at expressing myself, I am so sorry!

Really appreciate the response. I understand better now what you are trying to do. Although again, it really should not matter at all. Well, "should". You wrote:

in almost all situations clients, server libraries and proxies (reverse proxies etc) fail to handle the duplex case cleanly

Sounds like this had nothing to do with HTTP? I am very eager to hear if you have a real HTTP-related example of when and how concurrent writing on two different streams broke the HTTP chain.

Me, I was trying to understand how to do a HTTP/1.1 chunked request and stumbled onto the "Duplex" docs hahaha. I have understood I can probably hack the RequestBody.contentLength() to return -1, but there appears to be no public API for streaming the request body. This is off-topic of course, I will keep looking a little bit. Just for your info, the background.

martinandersson avatar Jan 30 '22 14:01 martinandersson

Me, I was trying to understand how to do a HTTP/1.1 chunked request and stumbled onto the "Duplex" docs hahaha. I have understood I can probably hack the RequestBody.contentLength() to return -1, but there appears to be no public API for streaming the request body. This is off-topic of course, I will keep looking a little bit. Just for your info, the background.

Look into BridgeInterceptor

      val contentLength = body.contentLength()
      if (contentLength != -1L) {
        requestBuilder.header("Content-Length", contentLength.toString())
        requestBuilder.removeHeader("Transfer-Encoding")
      } else {
        requestBuilder.header("Transfer-Encoding", "chunked")
        requestBuilder.removeHeader("Content-Length")
      }
    }

And Http1ExchangeCodec

  private val Request.isChunked: Boolean
      get() = "chunked".equals(header("Transfer-Encoding"), ignoreCase = true)

yschimke avatar Jan 30 '22 14:01 yschimke

Just saw this old doc on Reddit that also talks about duplex as a term of art here. https://datatracker.ietf.org/doc/html/draft-zhu-http-fullduplex

swankjesse avatar Nov 09 '23 10:11 swankjesse