chumsky
chumsky copied to clipboard
Add not_ref()
Works just like not()
but for BorrowInput
rather than ValueInput
.
This is just like I did for any_ref
, and needed for the same reason, that is, because not
can't be used for BorrowInput
, only ValueInput
Hmm, I worry that we're going to end up duplicating a lot of combinators to make this work across the crate. I think there probably needs to be a better way of generating errors without relying on being able to pull values out by-value. I need to revisit exactly how error generation works with MaybeRef
, but my intuition is that there should be a better way of solving this and relaxing the trait bound.
Yeah I do agree, relaxing the trait bound from ValueInput
to just Input
would be better, but I couldn't see how to do that!
@zesterer I'm thinking I'll have to publish my own variant of this crate as an interim measure, so I can publish (an update to) my beancount-parser-lima crate which depends on it.
Best practice in the community seems to be I publish a crate called tesujimath-chumsky with a clear description as to it's temporary nature, so publishing my own crate isn't blocked this.
As soon as there's a published solution to what was blocking me here and in #569, I'll delete my crate and switch back.
Ok?
Apologies, I've had a really busy few weeks (new job, among other things!). This is on my todo list (along with a series of other things), I'll try to get a decent response up here by the end of the weekend.
Great, and thank you! Sorry if it seems like I'm hassling you about this, that's not my intention. Really appreciate your work on Chumsky!
@zesterer I had another look at what would be required to support not()
et al for BorrowInput
.
Relaxing the trait constraints from BorrowInput
and ValueInput
to just Input
seems to come unstuck with the calls to next_ref()
or next()
, which seem fundamentally incompatible. I'm not even sure if this will be possible.
For now I have only two ideas:
- Implement the whole suite of
not_ref()
and the others forBorrowInput
, as parallels to the existingValueInput
ones (which is what I started to do here). - Give up on using
BorrowInput
in my application and simply useValueInput
everywhere.
Since my tokens come from Logos and contain &str
, using ValueInput
does not seem like an enormous loss. Mostly. But I do have a couple of more painful tokens to pass by value. Is this what others are doing? Is BorrowInput
able to become a first-class citizen somehow?
Relaxing the trait constraints ... seem fundamentally incompatible.
One existing issue with not
is that the way it reports errors is... strange. If you have something like just('a').and_is(just("avocado").not()).repeated()
and run it on an input like aaaaavocado
, you'll end up getting an error like
aaaaavocado
^
Unexpected 'a'
This seems... counterintuitive, to say the least given that an a
in that location is a legitimate pattern: provided it's not followed by vocado
.
The wider issue here is that we don't have a good way to fall back on a non-token pattern. Ideally, the error would be more like Unexpected 'avocado'
. This is actually a wider design problem with chumsky (see #538 and related issues) that I want to fix at some point, and it affects other combinators like filter
(there's no good way to state, in the positive, what the filter function was expecting).
If this wider problem were solved, then I suspect that the not
issue would also be solved: since the reported error could contain some fallback label instead of a long-lived reference to whatever token was found.
It's a tricky one: while I definitely see the value in this combinator, I'm also keenly aware that the reason for its existence is a product of poor design elsewhere in the crate. I think I'd be happy to accept it, but perhaps only as a temporary measure: perhaps it could be put behind the unstable
feature?
Is BorrowInput able to become a first-class citizen somehow?
The difficulty here is that the inputs supported by chumsky have mutually-exclusive feature sets.
Input type | Token type | Supports borrowing tokens? | Supports taking tokens by-value? |
---|---|---|---|
&'a str |
char |
false (str is UTF-8, does not contain char s) |
true (char is trivially copyable) |
&'a [u8] |
u8 |
true ([u8] is a slice, contains u8 s) |
true (u8 is trivially copyable) |
&'a [T] |
T |
true ([T] is a slice, contains T s) |
false (T 's copyability is unknown) |
The only commonality between these inputs is that they must be able to produce a MaybeRef<'a, Self::Token>
(i.e: either an owned Self::Token
or a borrowed &'a Self::Token
), which is the base requirement placed upon the Input
trait: making BorrowInput
first-class would necessitate making at minimum &'a str
an invalid input, and potentially many more (it would be nice, for example, to have non-memory data structures like impl Read
fit into this abstraction too, see #590 ).
I wasn't able to transition to ValueInput
for various reasons, so sticking with BorrowInput
, which is surely the right thing.
But by restructuring my parser, I was able to avoid needing both not_ref()
and one_of_ref()
, so neither this PR nor #569 are blocking my own progress.
Notwithstanding the not
weirdness, I do think it should be supported for BorrowInput
, or removed from ValueInput
. But I don't think I can help with anything here, sorry.
Thanks for continuing to make Chumsky more awesome! :heart:
Good to hear that you've been able to work around it. I'm going to keep this PR open for now, I if only as a reminder to myself to come up with a good solution to this wider problem.