chumsky icon indicating copy to clipboard operation
chumsky copied to clipboard

Unclear how to use `chumsky::primitive::custom()` perhaps not possible

Open robojeb opened this issue 2 years ago • 6 comments

The documentation for custom() states that you shouldn't need to use it, but doesn't provide any indication about how you would if you needed it.

There are a few issues I see:

First, the custom() function takes an unrestricted F generic type but the Parser trait is only implemented for a Custom<F: Fn(&mut StreamOf<I, E>) -> PResult<I, O, E>, ...>

This means that as a user, I don't know what type of closure to put in my custom parser, and also that it pushes the error to the call site rather than the definition of my custom parser. For example in the following the error appears in the .or(cust) location rather than at the definition of cust which would make debugging easier.

    let cust = chumsky::primitive::custom(|| 3); // Totally valid

    let parser = chumsky::primitive::just('3').or(cust).parse("3").expect(); // :(

The second, and perhaps larger issue is that if you do have the correct Fn(&mut StreamOf<I, E>) -> PResult<I, O, E> closure type, I am not actually sure you can do anything.

All of the useful parsing methods on Stream are marked pub(crate) which means you can't actually manipulate the stream within your custom parser.

robojeb avatar Oct 31 '21 18:10 robojeb

Hi, thanks for opening this issue.

...takes an unrestricted F generic type but...

This is a general problem I want to address in the codebase: adding constraints to call sites where I can. Sadly, this sometimes doesn't generalise well and requires extra redundant phantom type parameters in each of the combinators so working out where and how I can add these constraints is a bit of an involved process. For example, Custom would require 2 extra type parameters, I and O, for this to work. This might be worth doing, but it's still a tradeoff I want to investigate.

...All of the useful parsing methods on Stream are marked pub(crate)

Currently I want to discourage users implementing their own parsers directly because I think that the API is very likely to change (for example, I'm planning to improve the way the library considers alternative failure paths to avoid dropping errors unnecessarily which would require changing the signature of Parser::parse_inner and possibly the API of `Stream).

One possible solution here might be to expose a more limited but also more stable set of methods on Stream that allow users to still do some useful things like pulling tokens from a stream and back-tracking. I'd like to know your thoughts on this!

zesterer avatar Nov 01 '21 09:11 zesterer

I understand the library is still in flux and it isn't recommended. Though my expectation as a developer using a pre-1.0 library is that I may have to put in a lot of work to upgrade to the next point release.

I did run into a case where it would have been useful to build a custom parser.

While parsing I am doing some basic AST validation, this consists mostly of checking that integer literals don't overflow and certain other properties are upheld. I usually accomplish this with a try_map() so I can return an error or warning in the stream, but then I have to implement a recovery which basically re-does the parse but uses map() instead.

A nice contaminator would be try_map_or_default(F, V) and try_map_or_default_with(F, F2) which let you return an error, but then return a value as "recovery" based on the same parse without having to back-track. In use it would look like:

// For default
intparser().try_map_or_default(|digits| u32::from_str_radix(&digits, 16).map_err(...), 0);
// For default with
mytypeparser().try_map_or_default(|contents| MyType::new(contents).map_err(...), |contents| MyType::new_unchecked(contents));

// My current solution
intparser().try_map(...).recoverwith(ReparseUnchecked(intparser().to(0));

robojeb avatar Nov 02 '21 15:11 robojeb

Okay right, so you're looking for a way to generate an error as a side-effect of a parse? That's an interesting use-case... I think your current solution isn't actually too bad, although it would be nice to be able to skip the second parse. I can probably add a combinator that allows validating an output and emitting errors at the same time.

zesterer avatar Nov 02 '21 16:11 zesterer

I actually found this use-case after I filed this issue when I was trying to implement a custom() or a new structure with a the Parser type that I could use as new combinator both paths are blocked by not having access to the Stream parsing information.

I can open a separate issue about a side-effect parser combinator if you would prefer that be split out.

I want to say I am really enjoying your library.

robojeb avatar Nov 02 '21 17:11 robojeb

I can likely get to implementing this (the side-effect parser) tonight. When it comes to custom, I'm also going to try to pin down a more stable API subset that I can expose to make writing more complex parsers possible.

zesterer avatar Nov 02 '21 18:11 zesterer

I've added a Parser::validate function that should fit your use-case. It includes an example and will be included in the next version. Until then, you can point your project at this repository to make use of it. Let me know whether you encounter any issues when using it.

zesterer avatar Nov 03 '21 09:11 zesterer

zero-copy now has a useful custom combinator that has access to the parser input and several ways to manipulate it.

zesterer avatar Feb 20 '23 22:02 zesterer