chumsky icon indicating copy to clipboard operation
chumsky copied to clipboard

Add not_ref()

Open tesujimath opened this issue 7 months ago • 9 comments

Works just like not() but for BorrowInput rather than ValueInput.

This is just like I did for any_ref, and needed for the same reason, that is, because not can't be used for BorrowInput, only ValueInput

tesujimath avatar Nov 17 '23 02:11 tesujimath

Hmm, I worry that we're going to end up duplicating a lot of combinators to make this work across the crate. I think there probably needs to be a better way of generating errors without relying on being able to pull values out by-value. I need to revisit exactly how error generation works with MaybeRef, but my intuition is that there should be a better way of solving this and relaxing the trait bound.

zesterer avatar Nov 23 '23 19:11 zesterer

Yeah I do agree, relaxing the trait bound from ValueInput to just Input would be better, but I couldn't see how to do that!

tesujimath avatar Nov 23 '23 19:11 tesujimath

@zesterer I'm thinking I'll have to publish my own variant of this crate as an interim measure, so I can publish (an update to) my beancount-parser-lima crate which depends on it.

Best practice in the community seems to be I publish a crate called tesujimath-chumsky with a clear description as to it's temporary nature, so publishing my own crate isn't blocked this.

As soon as there's a published solution to what was blocking me here and in #569, I'll delete my crate and switch back.

Ok?

tesujimath avatar Dec 08 '23 00:12 tesujimath

Apologies, I've had a really busy few weeks (new job, among other things!). This is on my todo list (along with a series of other things), I'll try to get a decent response up here by the end of the weekend.

zesterer avatar Dec 08 '23 19:12 zesterer

Great, and thank you! Sorry if it seems like I'm hassling you about this, that's not my intention. Really appreciate your work on Chumsky!

tesujimath avatar Dec 08 '23 19:12 tesujimath

@zesterer I had another look at what would be required to support not() et al for BorrowInput.

Relaxing the trait constraints from BorrowInput and ValueInput to just Input seems to come unstuck with the calls to next_ref() or next(), which seem fundamentally incompatible. I'm not even sure if this will be possible.

For now I have only two ideas:

  1. Implement the whole suite of not_ref() and the others for BorrowInput, as parallels to the existing ValueInput ones (which is what I started to do here).
  2. Give up on using BorrowInput in my application and simply use ValueInput everywhere.

Since my tokens come from Logos and contain &str, using ValueInput does not seem like an enormous loss. Mostly. But I do have a couple of more painful tokens to pass by value. Is this what others are doing? Is BorrowInput able to become a first-class citizen somehow?

tesujimath avatar Jan 31 '24 01:01 tesujimath

Relaxing the trait constraints ... seem fundamentally incompatible.

One existing issue with not is that the way it reports errors is... strange. If you have something like just('a').and_is(just("avocado").not()).repeated() and run it on an input like aaaaavocado, you'll end up getting an error like

aaaaavocado
    ^
    Unexpected 'a'

This seems... counterintuitive, to say the least given that an a in that location is a legitimate pattern: provided it's not followed by vocado.

The wider issue here is that we don't have a good way to fall back on a non-token pattern. Ideally, the error would be more like Unexpected 'avocado'. This is actually a wider design problem with chumsky (see #538 and related issues) that I want to fix at some point, and it affects other combinators like filter (there's no good way to state, in the positive, what the filter function was expecting).

If this wider problem were solved, then I suspect that the not issue would also be solved: since the reported error could contain some fallback label instead of a long-lived reference to whatever token was found.

It's a tricky one: while I definitely see the value in this combinator, I'm also keenly aware that the reason for its existence is a product of poor design elsewhere in the crate. I think I'd be happy to accept it, but perhaps only as a temporary measure: perhaps it could be put behind the unstable feature?

Is BorrowInput able to become a first-class citizen somehow?

The difficulty here is that the inputs supported by chumsky have mutually-exclusive feature sets.

Input type Token type Supports borrowing tokens? Supports taking tokens by-value?
&'a str char false (str is UTF-8, does not contain chars) true (char is trivially copyable)
&'a [u8] u8 true ([u8] is a slice, contains u8s) true (u8 is trivially copyable)
&'a [T] T true ([T] is a slice, contains Ts) false (T's copyability is unknown)

The only commonality between these inputs is that they must be able to produce a MaybeRef<'a, Self::Token> (i.e: either an owned Self::Token or a borrowed &'a Self::Token), which is the base requirement placed upon the Input trait: making BorrowInput first-class would necessitate making at minimum &'a str an invalid input, and potentially many more (it would be nice, for example, to have non-memory data structures like impl Read fit into this abstraction too, see #590 ).

zesterer avatar Jan 31 '24 08:01 zesterer

I wasn't able to transition to ValueInput for various reasons, so sticking with BorrowInput, which is surely the right thing.

But by restructuring my parser, I was able to avoid needing both not_ref() and one_of_ref(), so neither this PR nor #569 are blocking my own progress.

Notwithstanding the not weirdness, I do think it should be supported for BorrowInput, or removed from ValueInput. But I don't think I can help with anything here, sorry.

Thanks for continuing to make Chumsky more awesome! :heart:

tesujimath avatar Feb 01 '24 01:02 tesujimath

Good to hear that you've been able to work around it. I'm going to keep this PR open for now, I if only as a reminder to myself to come up with a good solution to this wider problem.

zesterer avatar Feb 01 '24 09:02 zesterer