ureq icon indicating copy to clipboard operation
ureq copied to clipboard

Get content from reference to response

Open mladedav opened this issue 3 years ago • 13 comments

This possibly ties into #126.

I am using errors where I put ureq::Error as a source into other errors. When the error is being processed I would like to print the content but I am unable to do so because I only have a reference to the error.

From what I gathered I need to own the response to make it either into a string or an object (through JSON).

Another way to fix this I can tell is to make Response cloneable.

mladedav avatar Apr 27 '22 13:04 mladedav

This is a good idea. One thing gets in the way: a Response represents the HTTP response headers, plus a TcpStream that can be used to read the response body. We don't read the response body automatically because it might be big, or slow.

That means when an Error contains a Response, that Response doesn't actually contain the body bytes. You can read the body bytes using Response::into_reader, Response::into_string, etc, but those require an owned Response (not a reference, or even a &mut) because they take ownership of the inner TcpStream to do the reading.

There are a few possibilities. For instance, we could have a (configurable) minimum number of bytes that a Response would read automatically and hold as part of itself. Then there could be an accessor on &Response that allows reading those bytes. This would be handy to solve another problems - returning connections to the pool early when the response body is small.

For now, if you need a workaround, I think you might need to map ureq::Error::Status into a stringified error by calling response.into_string().

jsha avatar Apr 28 '22 03:04 jsha

I think that reading part of the response would be good enough for most scenarios. This would be good enough for me since I don't expect that the errors would be large. And if they were I wouldn't usually want to print them anyway, so it actually feels like a perfect fit for my scenaio, although I appreciate there are definitely other use cases.

Another option I just thought of would be to provide a possibly unsafe API to access the TCP stream either by having it in a option which could be taken (this would actually not be enough for me since I don't have mut&) or it could even be behind a Mutex (that would probably come with too large performance hit to be a good idea) or RefCell. The first access would invalidate the object for everyone though, hence having the API be unsafe. But it feels a bit hacky so I would not be surprised if you did not want this to be implemented.

mladedav avatar Apr 28 '22 10:04 mladedav

Yeah, we quite pride ourselves on the #![forbid(unsafe_code)] at the top of the crate.

algesten avatar Apr 30 '22 12:04 algesten

Fair enough. And what do you think with the possibility of prefetching e.g. first 1 kB of the response?

I would be happy to try to make a PR, I just don't want to waste effort if this is not a way this repo wants to go.

mladedav avatar May 01 '22 16:05 mladedav

Yeah, we definitely want to do something like that. There has been some discussion about it before. https://github.com/algesten/ureq/issues/264

algesten avatar May 01 '22 17:05 algesten

Reopening. #531 doesn't actually resolve this, it is just a prerequisite. Now that small responses have their body stored in the Response, it's practical to implement this. We will probably need to use Any::downcast.

Thinking about the API, I think it should be fn buffered_body(&self) -> Option<&[u8]>. And the Option should only be Some if the entire body was buffered (i.e. the body was sent using Content-Length and the body + headers were small enough to fit in our buffer).

jsha avatar Jul 10 '22 15:07 jsha

and the body + headers were small enough to fit in our buffer

this part i think we can improve. right now it's up to wheter the content was read in one syscall into the BufReader.

i think the user might want more control than that. we could allow a length be set on agent config.

None => default behavior Some(0) => no buffering Some(x) => buffer to x

algesten avatar Jul 10 '22 18:07 algesten

right now it's up to whether the content was read in one syscall into the BufReader.

Yes, I'm reconsidering my previous suggestion that we shouldn't call fill_buf repeatedly. For predictability, it would be good to have the property that any response small enough to fit in the buffer will definitely be buffered. Note that calling fill_buf a second time isn't sufficient to achieve this, since it's not guaranteed to maximally fill the buffer; AFAICT it makes one read attempt on the other reader.

we could allow a length be set on agent config.

I think there's no need to allow buffering to be turned off entirely, but setting a smaller or bigger buffer is reasonable.

jsha avatar Jul 11 '22 03:07 jsha

Woudn't it be also possible to have a method that buffers the whole response no matter the size (or with the size constraint specified on the call, not the agent) in case I'm interested in some larger responses but I don't want to have a large buffer for all responses because of that?

Something like fn buffer_body(&mut self, limit: usize) after which buffered_body would always return Some if the content length (I think headers shouldn't play a part here) is shorter than the limit provided. Internally the Stream would be replaced by a Cursor I guess, so this would have to be &mut self to call the first function (but it has to be called just once and it would allow all subsequent references to provide the body).

mladedav avatar Jul 11 '22 08:07 mladedav

but I don't want to have a large buffer for all responses because of that?

If we have the setting on agent, wouldn't it be quite easy to have two agents depending on the behavior you want?

lolgesten avatar Jul 11 '22 09:07 lolgesten

Not necessarily, e.g. it can depend on whether the call was successful or not. I opened this issue originally because of error handling where I had just a reference to the Response. If the error would be too large the issue would persist.

mladedav avatar Jul 11 '22 09:07 mladedav

Hi, I'll try to summarize.

The agent can be configured with Option<usize> specifying how large buffer should be available for each response. If the buffer is larger than the response length (including headers) then the response would be available with fn buffered_body(&self) -> Option<&[u8]> which would be guaranteed to return Some, None would indicate that the response was longer.

fill_buff can be called repeatedly when the buffering happens. It would be called until the buffer is full or deadline is exceeded.

I think my proposed change is still an open question:

  • Having a method to force the response to use a different (larger) buffer. This can be used when the code needs to dynamically decide whether to use larger buffer after the request, e.g. based on status code or headers. This can be worked around by having large enough buffer by default, so in the worst case this would waste a bit of memory. I would like to know your stance on this but looking at it now, it's not blocking the development itself.

I probably won't have any time for a month, but I just want to check that I can create a pull request (if no one else picks this up) sometime later based on what's written here.

mladedav avatar Aug 23 '22 08:08 mladedav

I'm ok with this addition. PR welcome.

algesten avatar Aug 28 '22 10:08 algesten