moo icon indicating copy to clipboard operation
moo copied to clipboard

Handle parsing errors in moo.states()

Open moranje opened this issue 6 years ago • 8 comments

Hi,

I'm looking for a way to get the offending in a moo states object. The following doesn't seem to work and a regular error is still thrown:

  moo.states({
    main: {
      // throws the error instead of tokenizing it
      myError: moo.error
    },
    // This throws a moo configuration erro
    myError: moo.error,
  });

What would be the correct way to get the error or offending token in a stateful lexer?

moranje avatar Aug 16 '18 11:08 moranje

The following works fine for me:

  moo.states({
    main: {
      // throws the error instead of tokenizing it
      myError: moo.error
    },
  });

I don't think it would make sense to allow configuring an error at the toplevel? Tokens must always be defined inside a state.

tjvr avatar Aug 18 '18 12:08 tjvr

The following works fine for me:

That only works for me if I have another token type in the list; otherwise it generates the regex /(?:)/my and then fails when it can't find the group that matched. If there are no tokens that match anything, instead of generating /(?:)/ (an irrefutable match), we should generate /(?!)/ (an impossible match).

I don't think it would make sense to allow configuring an error at the toplevel?

I think it might. Usually lexer states are opaque to the parser and it just sees a stream of tokens, so you very rarely want a) only certain states to have error tokens or b) different states to have different names for the error token. But I don't think the syntax @moranje provided makes sense—if we go this route, we should probably have a more general notion of state inheritance and/or a special state from which other states automatically inherit; then a global error token would be as simple as a { myError: moo.error } prototype state.

nathan avatar Aug 18 '18 14:08 nathan

I think it might. Usually lexer states are opaque to the parser and it just sees a stream of tokens, so you very rarely want a) only certain states to have error tokens or b) different states to have different names for the error token. But I don't think the syntax @moranje provided makes sense—if we go this route, we should probably have a more general notion of state inheritance and/or a special state from which other states automatically inherit; then a global error token would be as simple as a { myError: moo.error } prototype state.

I agree on both accounts. Since a parsing error a 'global' failure it would make more sense to handle that in a single location rather than redoing it over and over again. Preferably there would a way itself to having access to the offset, col and line parameters of the offending token. That and the syntax above in nonsensical.

moranje avatar Aug 19 '18 21:08 moranje

@moranje

Preferably there would a way itself to having access to the offset, col and line parameters of the offending token.

The moo.error notation already gives you that information:

const moo = require('moo')

const lexer = moo.states({
  main: {
    id: /\w+/,
    err: moo.error,
  },
})

lexer.reset('hello!')
lexer.next() // { type: 'id', value: 'hello', text: 'hello', offset: 0, lineBreaks: 0, line: 1, col: 1 }
lexer.next() // { type: 'err', value: '!', text: '!', offset: 5, lineBreaks: 0, line: 1, col: 6 }

nathan avatar Aug 19 '18 23:08 nathan

The moo.error notation already gives you that information

Thanks! Here's an update to the README to represent that #95.

moranje avatar Aug 20 '18 04:08 moranje

Nathan added support for including states in other states, and support for $all, in #93.

It still needs documentation and some tests :slightly_smiling_face:

tjvr avatar Sep 19 '18 16:09 tjvr

@moranje If you're interested in trying out the latest master and seeing how it works for you, that would be really useful feedback! :blush:

tjvr avatar Sep 20 '18 11:09 tjvr

Great! I have limited time to spare at the moment, but am excited to try out these additions. I'll try to implement the changes somewhere this week. I'll get back to you on this, great work!

moranje avatar Sep 20 '18 15:09 moranje