chumsky icon indicating copy to clipboard operation
chumsky copied to clipboard

Option for `.padded()` to not include padding in resulting span

Open kevinbarabash opened this issue 1 year ago • 5 comments

The following code prints out 0..3 which means that .padded() includes the padding in the span of the output.

let parser = just::<char, _, Simple<char>>("5").padded().map_with_span(|_, s: Span| s);
let result = parser.parse(" 5 ");
println!("result = {result:#?}");

Including padding in the span means I need to do a bunch of work to get the spans correct for error reporting since I don't want to include surrounding whitespace. In the example above this isn't too bad since I could swap the order of .padded() and .map_with_span().

In practice though, I've created a helper called just_with_padding() that's defined as:

pub fn just_with_padding(input: &str) -> BoxedParser<'_, char, &str, Simple<char>> {
    just(input).padded().boxed()
}

This helper is used everywhere in my parser because the language I'm parsing doesn't care about whitespace. Unfortunately, this approach makes it difficult to exclude the whitespace from the spans for the various nodes.

kevinbarabash avatar Aug 05 '22 03:08 kevinbarabash

The solution is surprisingly simple! Instead of doing the following:

just('5').padded().map_with_span(|_, s: Span| s)

You can do this:

just('5').map_with_span(|_, s: Span| s).padded()

map_with_span works on any parser and will always select the span of that parser's pattern, and not any surrounding tokens.

In general, I'd suggest moving away from writing a 'larser' (lexer + parser combined as one). It's much easier to have lexing and parsing be distinct steps (see the nano_rust example), because it means that your parser doesn't need to worry about trivialities like whitespace, accidentally misinterpreting keywords, etc. which makes its design much simpler.

zesterer avatar Aug 05 '22 08:08 zesterer

Unfortunately, the language I'm parsing is a variation on TypeScript and supports JSX (and template literal strings). Both of these things (JSX probably more so) make it really difficult to do lexing first as a separate step without having some parsing sneak in. 😞

I was looking at the docs for .map_with_span() again and saw they make use of a wrapper called Spanned to group the identifier with it's span data before calling .padded(). I'm going to try applying that approach throughout my parser/larser and see how it goes. 🤞

kevinbarabash avatar Aug 06 '22 00:08 kevinbarabash

Good luck, let me know if you run into issues!

zesterer avatar Aug 06 '22 12:08 zesterer

Using Spanned worked for some things, but I wasn't able to use it with delimited_by() and ended up having to replace that with then() instead.

kevinbarabash avatar Aug 06 '22 18:08 kevinbarabash

Oh? Remember that you can use map_with_span both before and after delimited_by to change whether you include the delimiters in the span or not.

zesterer avatar Aug 07 '22 10:08 zesterer