moo
moo copied to clipboard
Handle parsing errors in moo.states()
Hi,
I'm looking for a way to get the offending in a moo states object. The following doesn't seem to work and a regular error is still thrown:
moo.states({
main: {
// throws the error instead of tokenizing it
myError: moo.error
},
// This throws a moo configuration erro
myError: moo.error,
});
What would be the correct way to get the error or offending token in a stateful lexer?
The following works fine for me:
moo.states({
main: {
// throws the error instead of tokenizing it
myError: moo.error
},
});
I don't think it would make sense to allow configuring an error at the toplevel? Tokens must always be defined inside a state.
The following works fine for me:
That only works for me if I have another token type in the list; otherwise it generates the regex /(?:)/my
and then fails when it can't find the group that matched. If there are no tokens that match anything, instead of generating /(?:)/
(an irrefutable match), we should generate /(?!)/
(an impossible match).
I don't think it would make sense to allow configuring an error at the toplevel?
I think it might. Usually lexer states are opaque to the parser and it just sees a stream of tokens, so you very rarely want a) only certain states to have error tokens or b) different states to have different names for the error token. But I don't think the syntax @moranje provided makes sense—if we go this route, we should probably have a more general notion of state inheritance and/or a special state from which other states automatically inherit; then a global error token would be as simple as a { myError: moo.error }
prototype state.
I think it might. Usually lexer states are opaque to the parser and it just sees a stream of tokens, so you very rarely want a) only certain states to have error tokens or b) different states to have different names for the error token. But I don't think the syntax @moranje provided makes sense—if we go this route, we should probably have a more general notion of state inheritance and/or a special state from which other states automatically inherit; then a global error token would be as simple as a { myError: moo.error } prototype state.
I agree on both accounts. Since a parsing error a 'global' failure it would make more sense to handle that in a single location rather than redoing it over and over again. Preferably there would a way itself to having access to the offset
, col
and line
parameters of the offending token. That and the syntax above in nonsensical.
@moranje
Preferably there would a way itself to having access to the
offset
,col
andline
parameters of the offending token.
The moo.error
notation already gives you that information:
const moo = require('moo')
const lexer = moo.states({
main: {
id: /\w+/,
err: moo.error,
},
})
lexer.reset('hello!')
lexer.next() // { type: 'id', value: 'hello', text: 'hello', offset: 0, lineBreaks: 0, line: 1, col: 1 }
lexer.next() // { type: 'err', value: '!', text: '!', offset: 5, lineBreaks: 0, line: 1, col: 6 }
The moo.error notation already gives you that information
Thanks! Here's an update to the README to represent that #95.
Nathan added support for including states in other states, and support for $all
, in #93.
It still needs documentation and some tests :slightly_smiling_face:
@moranje If you're interested in trying out the latest master
and seeing how it works for you, that would be really useful feedback! :blush:
Great! I have limited time to spare at the moment, but am excited to try out these additions. I'll try to implement the changes somewhere this week. I'll get back to you on this, great work!