http-body icon indicating copy to clipboard operation
http-body copied to clipboard

Question: can frame.into_data() be incomplete?

Open kristof-mattei opened this issue 1 year ago • 4 comments

I use http-body to parse the body of an endless Transfer-Encoding: Chunked stream.

let frame = response.frame().await.expect("Stream ended").expect("Failed to read frame");

let Ok(data) = frame.into_data() else {
    // frame is trailers, ignored
    continue;
};

let decoded = serde_json::from_slice(&data)?;

// ...

But as I've discovered, under certain conditions data is incomplete. When complete it ends in \n.

To fix it I have a buffer that I only parse out the part of [0..(index of first b'\n'] and remove it from the buffer.

This leaves me with the following questions:

  • Is this expected behavior from Frame? Having a partial piece in there?
  • Is the \n a left-over from the Chunked separator \r\n?

kristof-mattei avatar Nov 28 '23 04:11 kristof-mattei

Are you intending to buffer the whole response body? If so, then yes it might contain more than one frame. You can get the whole response using BodyExt::collect:

body.collect().await?.to_bytes()

davidpdrsn avatar Nov 28 '23 06:11 davidpdrsn

It's not that it's incomplete, but this a common misconception: writes from a peer do not equal the exact same reads locally. There are multiple things that can make a write get cut up into smaller pieces: TCP segment size, HTTP/2 DATA frame size, TLS record size, proxies/intermediaries.

You essentially want something like read_until(). This requires buffering data, since each "frame" may not contain all the bytes you want.

Enough people have asked about this that it makes me think we could probably come up with a helper in http-body-util.

seanmonstar avatar Nov 28 '23 13:11 seanmonstar

Are you intending to buffer the whole response body? If so, then yes it might contain more than one frame. You can get the whole response using BodyExt::collect:

body.collect().await?.to_bytes()

No, the body is endless.

It's not that it's incomplete, but this a common misconception: writes from a peer do not equal the exact same reads locally. There are multiple things that can make a write get cut up into smaller pieces: TCP segment size, HTTP/2 DATA frame size, TLS record size, proxies/intermediaries.

You essentially want something like read_until(). This requires buffering data, since each "frame" may not contain all the bytes you want.

Enough people have asked about this that it makes me think we could probably come up with a helper in http-body-util.

Okay Frame is a lower level than the CRLF-separated Chunk.

Looking at the spec a little bit more: https://en.wikipedia.org/wiki/Chunked_transfer_encoding#Encoded_data it seems that my \n detection is probably not correct and I need to do something a little bit smarter taking the chunk size into account.

kristof-mattei avatar Nov 28 '23 14:11 kristof-mattei

@seanmonstar reading more I think I found where I got confused.

In HTTP2, which doesn't have Chunked, but it has Frames: https://httpwg.org/specs/rfc7540.html#FrameTypes

So the name Frame in http1 shorted my brain.

kristof-mattei avatar Nov 30 '23 13:11 kristof-mattei