chomp icon indicating copy to clipboard operation
chomp copied to clipboard

string parser (and possibly others internally using consume_while) force unnecessary stream reads

Open dario23 opened this issue 6 years ago • 1 comments

problem

the chomp::parsers::string parser (and possibly others internally using consume_while) might force unnecessary stream reads. example code:

#[macro_use]
extern crate chomp;

use chomp::prelude::*;
use chomp::buffer::{Source, Stream};

use std::net::TcpStream;


fn main() {
    let tcp = TcpStream::connect("faumail.fau.de:143").unwrap();
    let mut src = Source::new(tcp);

    // IMAP lines end in b"\r\n", so the real text is everything up to b'\r',
    // but we have to read the line ending nonetheless before reading any future stuff
    let p = src.parse(parser!{take_till(|c| c == b'\r') <* string(b"\r\n")});
    println!("{:?}", p);
}

expected output: Ok(<some bytes from the imap server welcome line>)

actual output: Err(Retry)

cause

the string parser (src/parsers.rs:378) uses consume_while(f), which first reads the next token from the input stream, and only after that inspects it (using f) for whether to consume it or not. note this is not a bug in consume_while, but its perfectly fine expected behaviour. the problem with using it the way it currently is for string(s) is that after len(s) tokens have been consumed, we could return successfully, but consume_while waits for the next token to call its decider function on (which then determines that it has read len(s) tokens already and tells consume_while to quit), which in some cases can force a read on the underlying stream when actually the answer would be clear.

solution

i wrote a (very hackish) fix for the string parser at https://github.com/dario23/chomp/tree/fix_string but (without having checked in depth) i'm expecting more parsers to be affected. probably a more exhaustive fix would include adding consume_while_max_n(f, usize).

i'd be happy to propose changes and submit a PR, but only after hearing your opinion on the matter :-)

dario23 avatar Jul 16 '17 23:07 dario23

afterthought: with some (minor?) api change and test breakage i think this specific instance could also be solved by making string(s) do something roughly like (not actual code, just illustration)

fn string(i, s) {
    let bytes = i.take(len(s))
    if s == bytes {
        i.ret(bytes)
    } else {
        i.err(expected(something))
    }
}

where the question would be what the something should be and whether the different amount of tokens read from the source (currently until mismatch token, here none if mismatch anywhere) make any difference. with some luck it's mostly restored anyway, so (even though it's still an api-breaking change) it would not need a lot of actual code changes.

dario23 avatar Jul 16 '17 23:07 dario23