ureq icon indicating copy to clipboard operation
ureq copied to clipboard

Redirect 307 / 308 correctly

Open oberien opened this issue 5 years ago • 11 comments

This feature was already tried to be implemented. However, when sending the request, the body is consumed (due to being dyn Read), so it can't be sent again for the 307 or 308 redirect, which requires the same request to be sent again.

https://github.com/algesten/ureq/blob/5b75deccef13f1ceb2b8d546a735c0d3bc3bfc83/src/unit.rs#L237-L238

From what I can tell so far after looking at the source, we can reset any Payload back to the beginning, except the generic Reader(Box<dyn Read + 'static>). If this reader also implemented Seek, we would be able to get its position before starting the body-write, and reset it to that position on a 307 / 308 to read from it again. However, this would be a breaking API change to Request::send(&mut self, reader: impl Read + 'static) -> Response as the argument now also needs to implement Seek. (Additionally, io::Empty currently doesn't implement Seek, for which I opened https://github.com/rust-lang/rust/issues/78029.)
Another option would be to require the reader to be Clone, however I'd see that as a strictly worse version of Seek.

oberien avatar Oct 16 '20 22:10 oberien

Thanks for filing, and welcome to the repo! I've been wanting to get this done for a while. It sounds like your proposal has two parts:

  • For all input types other than Reader(Box<dyn Read + 'static>), just reset in the straightforward way.
  • Add another input type like Reader(Box<dyn Read + Seek + 'static>) so that certain Read inputs (like files) can be seek'ed to reset.

Is that right?

jsha avatar Oct 16 '20 23:10 jsha

Can we solve this with a buffer of configurable size? (This is an idea @jsha started)

The default buffer size would be some multiple of a probable initial TCP window size, meaning the server would have a chance to send the 307/308 response while the client sending more than the buffer size is "held back" by TCP flow control.

Obviously probable initial TCP window size is a bit moot, so the user should be able to configure this for Agent and/or Request.

In addition, such a buffer could means for small Read bodies, we can pre-buffer the entire contents into memory, avoiding transfer-encoding chunked.

algesten avatar Oct 17 '20 08:10 algesten

For all input types other than Reader(Box<dyn Read + 'static>), just reset in the straightforward way.

A straightforward way could be added. However, I think that all creations of the dyn Read type passed to the SizedReader already implement Seek, so the bound could just be added to SizedReader and work for all non-Reader payload variants (once stdlib implements Seek for io::Empty; if the fix should be done before, it might make sense to introduce an additional trait and implement that for io::Empty).

Add another input type like Reader(Box<dyn Read + Seek + 'static>) so that certain Read inputs (like files) can be seek'ed to reset.

I would have replaced the current Reader type by adding the Seek bound. This ensures that all variants of Payload are resettable. Thus the Seek bound can also be added to SizedReader, which allows the unit::connect function to internally reset the SizedReader it got as argument when redirecting.

the server would have a chance to send the 307/308 response while the client sending more than the buffer size is "held back" by TCP flow control.

This might be a very interesting idea. I'm not too sure about the intricacies of HTTP and if the server must respond with 307 / 308 when only receiving a partial payload, or if it might be allowed to hold back the response until the whole payload is received. I could imagine a reverse proxy like nginx buffering several chunks before forwarding them as a single chunk to the original server. Otherwise I think this could be an interesting idea.
Another problem with this approach would be latency. How do you make sure to stop sending until you receive the response? What about long fat pipes with a delay of several seconds but high bandwidth? You'd need to wait for a whole round-trip without necessarily knowing how long one such round-trip would be.
Or maybe I'm already overthinking this and it's an already solved problem ¯\_(ツ)_/¯

oberien avatar Oct 17 '20 08:10 oberien

I'm not too sure about the intricacies of HTTP and if the server must respond with 307 / 308 when only receiving a partial payload, or if it might be allowed to hold back the response until the whole payload is received.

I suspect that's entirely up to the HTTP server, but it would be kinda awkward to support 307/308 without also considering the case of aborting uploads of large bodies.

A related technique that we might want to consider is expect 100-continue. Curl has a default 1 second delay before sending the body, in which the server has a chance to respond with some other code aborting the body send.

That's another way to do it, which could be done for 307/308 in addition to buffering.

  1. Send headers,
  2. flush
  3. wait 1 sec
  4. send buffered body
  5. send unbuffered body

During steps 3-4 the server can successfully respond with 307/308.

algesten avatar Oct 17 '20 09:10 algesten

I would have replaced the current Reader type by adding the Seek bound.

Part of our user-facing API is:

pub fn send(&mut self, reader: impl Read + 'static) -> Response

I'd prefer not to add a Seek bound to that, because we want to support sending a streaming body, where the user doesn't know ahead of time how big it will be.

I'm not too sure about the intricacies of HTTP and if the server must respond with 307 / 308 when only receiving a partial payload, or if it might be allowed to hold back the response until the whole payload is received.

Personally I've never seen any HTTP server do something like this. It would require some fairly custom code in most HTTP server implementations, but more importantly it would require client cooperation. Specifically, the client would have to send some of the body, then do a non-blocking read from the response stream to see if the server responded early. That's pretty unusual - though perhaps more common in the HTTP/2 world?

I think algesten's on the right track, that this is the sort of problem Expect: 100-continue is designed to solve.

One thing that would help us reach a good design: Can you give examples of real-world APIs or websites that return 307 or 308? If we design around making those real use cases work well, I think we'll wind up with something good.

jsha avatar Oct 17 '20 17:10 jsha

Expect: 100-continue looks like the correct fix here. And if curl does something, it should be perfect do copy its behaviour.

I noticed the missing 308 redirection when making a GET request to the gitlab.com API. If you include a double-slash in the request, e.g. try to request https://gitlab.com/api/v4//groups, it returns a 308 redirecting to the URL without the double-slash.

oberien avatar Oct 17 '20 17:10 oberien

That's pretty unusual - though perhaps more common in the HTTP/2 world?

Lol.. And here I was thinking that would be the norm, cause how would you otherwise detect "early" response headers? … Any early 3xx, 4xx or 5xx is an opportunity to abort sending a big body. Maybe I'm tainted by h2 :D

algesten avatar Oct 18 '20 10:10 algesten

On the flip side (a little off-topic but it's entertaining IMO), there are some servers out there that will send you a response as soon as you connect, without even waiting for a request! This gives up any opportunity to serve different responses for different requests, but the hosts I've seen it on seem to be serving analytics JS at very large scale. It's probably a significant performance win, and they're always only serving the same JS anyhow.

jsha avatar Oct 18 '20 16:10 jsha

I'm not sure the current status of this, but deadlinks would like this to avoid having to deal with redirects itself: https://github.com/deadlinks/cargo-deadlinks/issues/110. It would be fine by me to treat it exactly the same as 301/302.

jyn514 avatar Nov 23 '20 23:11 jyn514

Note that since https://github.com/algesten/ureq/pull/288 307/308 are followed for GET/HEAD, which should help the majority of use cases. For PUT/POST/DELETE you still have to handle the error yourself.

jyn514 avatar Jan 10 '21 22:01 jyn514