combine icon indicating copy to clipboard operation
combine copied to clipboard

Obtaining source positions

Open mwaldrich opened this issue 4 years ago • 6 comments

How do you obtain source positions in combine?

I found the position combinator, combine::combinator::position, but that yields a position of type <I as combine::StreamOnce>::Position. For example:

parser! {
    fn function['a, 'b, I](lang: &'b Lang<'a,I>)(I) -> (Id, Function)
    where [I: Stream<Item = char>]
    {
        lang.reserved("function")
        .with(position()) // <--------- Here
        .and(id(lang))
...

It seems like this is the best way to get position information in the case of a successful parse (e.g. not an error case). How do you use this type? How do you extract a line number from it? A column number?

mwaldrich avatar Dec 03 '20 20:12 mwaldrich

I::Position can be whatever you need it to be. By default for &str and &[u8] it is just an offset into the slice (as that is the cheapest), but if you want line + column information for a &str you can wrap it via the position stream https://docs.rs/combine/4.4.0/combine/stream/position/index.html.

let input = position::Stream::new(input_str);
parser.parse(input)

Marwes avatar Dec 04 '20 10:12 Marwes

@Marwes I found the PointerOffset positioner for str to be difficult to use. I wanted to be able to immediately get the byte position in the source string when constructing the object I'm parsing (it needs to remember the byte range in the string because the next step after parsing will be to replace those placeholders with generated bind parameters for the given database): https://github.com/launchbadge/sqlx/pull/1076/files#diff-c300d87b76b58c3089d028475ece7eef0f7ad9c200a18a73a79828765f84e4e3R176

However, because you need to call .translate_position() with the source string to get a meaningful value, that would necessitate passing the string down through parser functions to where it's needed, which is annoying and also, I feel, kind of an anti-pattern.

IndexPositioner is almost what I want given that its RangePositioner impl ends up calling str::len(), however it doesn't track byte positions correctly for individual chars since it'll invariably count each char, even multibyte ones, as just 1. The docs should probably discourage using it for streams based on str because that could be a significant footgun for users that forget to test non-ASCII strings.

What I ended up doing was just writing my own Positioner impl that's specialized for str that uses char::len_utf8(): https://github.com/launchbadge/sqlx/pull/1076/files#diff-c300d87b76b58c3089d028475ece7eef0f7ad9c200a18a73a79828765f84e4e3R108-R138

This seems like it'd be good to add to the library, I'd gladly open a PR if you're interested.

abonander avatar May 04 '21 04:05 abonander

@Marwes I'm also having some problems with the source positions. I'm making a parser which returns the line &column number but Rust is complaining about mismatched types. It says: expected associated type <Input as StreamOnce>::Position found struct SourcePosition . Here is the code in question: fn jumpable<Input>() -> impl Parser<Input, Output = (u32, u32)> where Input: Stream<Token = char>, Input::Error: ParseError<Input::Token, Input::Range, Input::Position>, { position() .map(| SourcePosition { line: l, column: c } | { (c,l) }) } How can I fix this? I'm very confused. Sorry I couldn't get it to format nicely so here's a screenshot. image

Gallagator avatar May 10 '21 21:05 Gallagator

This seems like it'd be good to add to the library, I'd gladly open a PR if you're interested.

Yes, that seems like a good addition. Never thought about IndexPositioner not being very useful for strings as I have always done the translation at top/end of the parse (no need to translate things until you know the parser as a whole has failed).

@abonander

Marwes avatar May 12 '21 15:05 Marwes

@Gallagator If you want to use a specific type in the parser you need to specific it in the type signature (same thing that you did with char.

Input: Stream<Token = char, Position = SourcePosition>

Marwes avatar May 12 '21 15:05 Marwes

@Marwes Thanks a lot! I was very confused!

Gallagator avatar May 12 '21 23:05 Gallagator