chumsky
chumsky copied to clipboard
Option for `.padded()` to not include padding in resulting span
The following code prints out 0..3
which means that .padded()
includes the padding in the span of the output.
let parser = just::<char, _, Simple<char>>("5").padded().map_with_span(|_, s: Span| s);
let result = parser.parse(" 5 ");
println!("result = {result:#?}");
Including padding in the span means I need to do a bunch of work to get the spans correct for error reporting since I don't want to include surrounding whitespace. In the example above this isn't too bad since I could swap the order of .padded()
and .map_with_span()
.
In practice though, I've created a helper called just_with_padding()
that's defined as:
pub fn just_with_padding(input: &str) -> BoxedParser<'_, char, &str, Simple<char>> {
just(input).padded().boxed()
}
This helper is used everywhere in my parser because the language I'm parsing doesn't care about whitespace. Unfortunately, this approach makes it difficult to exclude the whitespace from the spans for the various nodes.
The solution is surprisingly simple! Instead of doing the following:
just('5').padded().map_with_span(|_, s: Span| s)
You can do this:
just('5').map_with_span(|_, s: Span| s).padded()
map_with_span
works on any parser and will always select the span of that parser's pattern, and not any surrounding tokens.
In general, I'd suggest moving away from writing a 'larser' (lexer + parser combined as one). It's much easier to have lexing and parsing be distinct steps (see the nano_rust
example), because it means that your parser doesn't need to worry about trivialities like whitespace, accidentally misinterpreting keywords, etc. which makes its design much simpler.
Unfortunately, the language I'm parsing is a variation on TypeScript and supports JSX (and template literal strings). Both of these things (JSX probably more so) make it really difficult to do lexing first as a separate step without having some parsing sneak in. 😞
I was looking at the docs for .map_with_span()
again and saw they make use of a wrapper called Spanned
to group the identifier with it's span data before calling .padded()
. I'm going to try applying that approach throughout my parser/larser and see how it goes. 🤞
Good luck, let me know if you run into issues!
Using Spanned
worked for some things, but I wasn't able to use it with delimited_by()
and ended up having to replace that with then()
instead.
Oh? Remember that you can use map_with_span
both before and after delimited_by
to change whether you include the delimiters in the span or not.