ometa-js icon indicating copy to clipboard operation
ometa-js copied to clipboard

Re-entrant ometa/js?

Open keeyip opened this issue 9 years ago • 4 comments

I am maintaining a fork of ometa/js, https://github.com/keeyip/nodemeta, which aims to make tweaking ometa/js easier. I want to use Ometa as a common language for creating syntax highlighters for the CodeMirror text editor, but because a highlighter in CodeMirror is fed input line by line, I was hoping there would be a straightforward way of altering the Ometa/JS implementation to allow for pausing/continuing a parse. The best I can think of is to wrap all the rules with a call/cc, do you think this is a reasonable approach?

Example of CodeMirror's syntax highlighting api:

function parseIt(code) {
   return OmetaParser.matchAll(code, 'start')
}

// Note: The real api names are not important
CodeMirror.registerSyntaxHighlighter('mylang', function myHighlighter(singleLineOfCode) {
  // CodeMirror asynchronously dispatches to myHighlighter
  parseIt(singleLineOfCode)
})

keeyip avatar Nov 21 '14 18:11 keeyip

I think to get good (=fast) results you'll have to rethink things on the CodeMirror side. That is if you want to use a full language grammar for the highlighting. You could of course just replace the highlighting rules (the existing regexes) with Ometa rules, but then using Ometa kinda loses its point. I am tackling the same problem at the moment (with a custom language, not Ometa/Metacoffee, and Ace), would be interested in what you come up with.

For Ometa, I think the main challenge is that you'd have to remember the state of the parser at each line, as any line can be changed, triggering syntax highlighting starting from that point (which could lead to a totally different path in the execution tree of the parser).

xixixao avatar Nov 28 '14 21:11 xixixao

I always thought the problem was dealing with errors. A source code file is constantly in flux. Most of the time it will be close to valid but not completely valid, by the language’s syntax rules. An OMeta parser for that language would immediately fail as soon as you add one curly brace or parenthesis in your editor.

  • Josh

On Nov 28, 2014, at 1:28 PM, Michal Srb [email protected] wrote:

I think to get good (=fast) results you'll have to rethink things on the CodeMirror side. That is if you want to use a full language grammar for the highlighting. You could of course just replace the highlighting rules (the existing regexes) with Ometa rules, but then using Ometa kinda loses its point. I am tackling the same problem at the moment (with a custom language, not Ometa/Metacoffee, and Ace), would be interested in what you come up with.

For Ometa, I think the main challenge is that you'd have to remember the state of the parser at each line, as any line can be changed, triggering syntax highlighting starting from that point (which could lead to a totally different path in the execution tree of the parser).

— Reply to this email directly or view it on GitHub https://github.com/alexwarth/ometa-js/issues/22#issuecomment-64929541.

joshmarinacci avatar Nov 29 '14 23:11 joshmarinacci

I ended up taking a slightly different approach by essentially sidestepping CodeMirror's syntax highlighter:

Whenever the code changes, I kick off a full parse on a web worker. Once the background task is finished, I use CodeMirror's doc.markText feature to mark ranges of text with the appropriate CSS className. This works acceptably fast; there is a slight lag between the user entering text and when the correct highlights are refreshed, but it is not distracting.

To account for incomplete code, I relaxed my rules' cardinality constraints, ie: MultilineComment = "/*" commentBody? "*/"? instead of MultilineComment = "/*" commentBody "*/"

keeyip avatar Nov 30 '14 22:11 keeyip

@keeyip Have you made any progress on this?

I have though on this too. But the problem is that highlighting big files requires parsing the whole file in case you're applying a PEG parser. That's why the most common approach is to use regular expressions to highlight at least something without IDE hang (though it fails if there is something like opened quote in the unread part of the file).

That's why the real solution is to have the AST close to the B-tree of the original file, and do iterative parsing. This is in no way a simple thing, especially in arbitrary-lookup grammars like PEGs.

Currently reentering routines is easily done with ES6 generators. I think it doesn't take a lot of time to replace every function to function* in OMeta's implementation and add several yields in strategically chosen positions. (Well, maybe you'll need something like async-q library to tackle some loops.) At least that's what I've done in an experimental PEG.js-like parser generator.

reverofevil avatar Oct 13 '15 20:10 reverofevil