Nathan
Nathan
@tjvr simplest fix is to escape `-` as `\x2d` (which works everywhere) instead of `\-` (which only works inside `[]` in Unicode mode, even under Annex B). It's incomprehensible why...
Perhaps we could use a similar solution to that of https://github.com/no-context/moo/issues/116, combined with some indicator that strings should be case-insensitive as well? ```js moo.compile({ STRING: /"(?:[^\\]|\\.)*?"/i, NUMBER: /(?:\.\d+|\d+\.?\d*)/i, ADD: {match:...
> This doesn't do anything about token boundaries, does it? Correct. It is up to the user to push data at token boundaries.
> Probably, yes. Done.
It is quite useful to be able to do this: ```js fs.createReadStream(wherever) .pipe(split(/(\n)/)) .pipe(lexer.clone()) .pipe(new Parser()) .on('data', console.log) ``` (modulo disgusting `.on('error', …)`s everywhere because the stream API is broken)...
@tjvr Another option for the stream API is to buffer input until we get a regex match that doesn't extend to the end of the buffer—with an optional maximum buffer...
> I thought you said that still isn't correct? It's not correct, though it *would* give the correct result in the example I gave. Where it wouldn't give the correct...
> Doesn't that mean that given the language foo | foobar, we wouldn't correctly lex foob, ar? Correct.
Here ya go. [Copied to a gist too.](https://gist.github.com/nathan/d8d1adea38a1ef3a6d6a06552da641aa) ```js const moo = require('moo') const lexer = moo.compile({ ws: /[ \t]+/, nl: { match: /(?:\r\n?|\n)+/, lineBreaks: true }, id: /\w+/, })...