mpart-async Field boundary finding logic causes a busy exhausted loop

I found an issue in the field boundary finding logic. The current logic is such that, if buffer in MultipartParser::poll_next does not contain the entire field boundary but happens to contain a carriage return (0x0d), MultipartParser could end up in a busy exhausted loop that won't stop until more data becomes available from the upstream (which might not happen causing an infinite loop).

This issue manifests itself when the following conditions are true:

Upstream produces a buffer that contains field data which for one reason or another contains a carriage return (0x0d), but not the entire field boundary. This is pretty common when uploading binary files where 0x0d might have an entirely different meaning.
Upstream is not fast enough to produce another buffer by the time MultipartParser tries again.
User code calls continue when field.next().await.unwrap().unwrap().is_empty() is true.

Example of user code including a comment:

let mut mpart = MultipartStream::new("abc", stream);
while let Some(result) = mpart.next().await {
    let mut field = result.unwrap();

    while let Some(result) = field.next().await {
        let field_data_chunk = result.unwrap();

        if field_data_chunk.is_empty() {
            // Calling `continue` here causes `field.next` to be called again which in turn returns
            // an empty data chunk endlessly until `stream` has more data available.
            continue;
        }
    }
}

One would think that calling break instead of continue in the user code is a better idea. However, that causes a ShouldPollField error upon calling mpart.next().await.unwrap().unwrap() for the next field because MultipartParser is not done with the previous field.

I created a pull request where I added four unit tests which demonstrate the problem I described in this issue. Field boundary finding logic issues.

I took a look at the field boundary finding logic in MultipartParser::poll_next. In my opinion the logic should be reworked.

I would personally implement the logic so that MultipartParser has an internal buffer matching the length of the expected field boundary. Let's call the internal buffer boundary_buf and its length n. MultipartParser receives data from the upstream into buffer. Upon receiving data MultipartParser would check if boundary_buf + buffer contains the field boundary. If it does not contain the field boundary, the last n bytes from buffer would be copied into boundary_buf and then another loop cycle could be performed once more data is available from the upstream. To keep this explanation shorter, I won't delve into more details. What I am essentially describing here is a shift register.

Mar 09 '24 21:03 joelposti

Can you please provide a minimal repro for this

May 31 '24 02:05 cetra3

Hello, absolutely. A repro is available in the form of tests in the pull request https://github.com/cetra3/mpart-async/pull/28. Check especially test field_boundary_finding_should_not_cause_busy_exhausted_loop_2.

May 31 '24 06:05 joelposti

mpart-async mpart-async copied to clipboard

Field boundary finding logic causes a busy exhausted loop

mpart-async
mpart-async copied to clipboard