combine
combine copied to clipboard
Obtaining source positions
How do you obtain source positions in combine?
I found the position combinator, combine::combinator::position
, but that yields a position of type <I as combine::StreamOnce>::Position
. For example:
parser! {
fn function['a, 'b, I](lang: &'b Lang<'a,I>)(I) -> (Id, Function)
where [I: Stream<Item = char>]
{
lang.reserved("function")
.with(position()) // <--------- Here
.and(id(lang))
...
It seems like this is the best way to get position information in the case of a successful parse (e.g. not an error case). How do you use this type? How do you extract a line number from it? A column number?
I::Position
can be whatever you need it to be. By default for &str
and &[u8]
it is just an offset into the slice (as that is the cheapest), but if you want line + column information for a &str
you can wrap it via the position stream https://docs.rs/combine/4.4.0/combine/stream/position/index.html.
let input = position::Stream::new(input_str);
parser.parse(input)
@Marwes I found the PointerOffset
positioner for str
to be difficult to use. I wanted to be able to immediately get the byte position in the source string when constructing the object I'm parsing (it needs to remember the byte range in the string because the next step after parsing will be to replace those placeholders with generated bind parameters for the given database): https://github.com/launchbadge/sqlx/pull/1076/files#diff-c300d87b76b58c3089d028475ece7eef0f7ad9c200a18a73a79828765f84e4e3R176
However, because you need to call .translate_position()
with the source string to get a meaningful value, that would necessitate passing the string down through parser functions to where it's needed, which is annoying and also, I feel, kind of an anti-pattern.
IndexPositioner
is almost what I want given that its RangePositioner
impl ends up calling str::len()
, however it doesn't track byte positions correctly for individual char
s since it'll invariably count each char
, even multibyte ones, as just 1. The docs should probably discourage using it for streams based on str
because that could be a significant footgun for users that forget to test non-ASCII strings.
What I ended up doing was just writing my own Positioner
impl that's specialized for str
that uses char::len_utf8()
: https://github.com/launchbadge/sqlx/pull/1076/files#diff-c300d87b76b58c3089d028475ece7eef0f7ad9c200a18a73a79828765f84e4e3R108-R138
This seems like it'd be good to add to the library, I'd gladly open a PR if you're interested.
@Marwes
I'm also having some problems with the source positions. I'm making a parser which returns the line &column number but Rust is complaining about mismatched types. It says: expected associated type <Input as StreamOnce>::Position
found struct SourcePosition
.
Here is the code in question:
fn jumpable<Input>() -> impl Parser<Input, Output = (u32, u32)> where Input: Stream<Token = char>, Input::Error: ParseError<Input::Token, Input::Range, Input::Position>, { position() .map(| SourcePosition { line: l, column: c } | { (c,l) }) }
How can I fix this? I'm very confused.
Sorry I couldn't get it to format nicely so here's a screenshot.
This seems like it'd be good to add to the library, I'd gladly open a PR if you're interested.
Yes, that seems like a good addition. Never thought about IndexPositioner
not being very useful for strings as I have always done the translation at top/end of the parse (no need to translate things until you know the parser as a whole has failed).
@abonander
@Gallagator If you want to use a specific type in the parser you need to specific it in the type signature (same thing that you did with char
.
Input: Stream<Token = char, Position = SourcePosition>
@Marwes Thanks a lot! I was very confused!