moo icon indicating copy to clipboard operation
moo copied to clipboard

Unicode support for keywords

Open stuchl4n3k opened this issue 5 years ago • 5 comments

Since /u is supported now, is there some convenient way to define a rule using an array of keywords with unicode enabled? Sth. like:

const keywords = ['foo', 'bar'];
moo.compile({
   KEY: {
      match: keywords, 
      type: moo.keywords({KEY: keywords}), 
      unicode: true,
   },
});

In my understanding moo.keywords in the unicode scenario only work if the "match" is a pattetrn with an /u flag.

stuchl4n3k avatar Sep 25 '19 10:09 stuchl4n3k

moo.keywords only works properly when you use it on a matcher that matches anything that could be a word—not just keywords. For example, this lexer doesn't work the way you seem to expect it to:

const moo = require('moo')

const KW = ['ban', 'this']
const lexer = moo.compile({
  kw: {match: KW, type: moo.keywords({kw: KW})},
  w: /[A-Za-z_][\w]*/,
  ws: / +/,
})
lexer.reset('banana ban')
lexer.next() // {type: 'kw', value: 'ban'}
lexer.next() // {type: 'w', value: 'ana'}

The normal use case for moo.keywords looks like this:

const moo = require('moo')

const KW = ['ban', 'this']
const lexer = moo.compile({
  w: {match: /[A-Za-z_][\w]*/, type: moo.keywords({kw: KW})},
  ws: / +/,
})
lexer.reset('banana ban')
lexer.next() // {type: 'w', value: 'banana'}
lexer.next() // {type: 'ws', value: ' '}
lexer.next() // {type: 'kw', value: 'ban'}

It actually works fine with Unicode as-is:

const moo = require('moo')

const KW = ['η', 'ο', 'το', 'οι', 'τα']
const lexer = moo.compile({
  w: {match: /\p{XIDS}\p{XIDC}*/u, type: moo.keywords({kw: KW})},
  ws: {match: /\p{WSpace}+/u, lineBreaks: true},
})
lexer.reset('η ηθική')
lexer.next() // {type: 'kw', value: 'η'}
lexer.next() // {type: 'ws', value: ' '}
lexer.next() // {type: 'w', value: 'ηθική'}

We also already allow string literal and array matches to be combined with /u regular expressions, so I'm not sure what you're asking for here.

(Some of these changes haven't been published to npm yet [@tjvr]; maybe that's where the confusion is coming from?)

nathan avatar Sep 25 '19 19:09 nathan

Thank nathan, after seeing the first two examples it became much clearer.

Regarding the array match combined with /u - I haven't found that in the doc nor in the tests.

stuchl4n3k avatar Sep 25 '19 20:09 stuchl4n3k

I haven't found that in the doc nor in the tests.

We should probably have a test for that. The /u tests are a bit sparse at the moment.

nathan avatar Sep 26 '19 01:09 nathan

When’s the next npm publish planned?

agorischek avatar Sep 26 '19 16:09 agorischek

I've published 0.5.1. :+1:

tjvr avatar Sep 29 '19 18:09 tjvr