tree-sitter-markdown icon indicating copy to clipboard operation
tree-sitter-markdown copied to clipboard

MDX Parser

Open barrett-ruth opened this issue 1 year ago • 5 comments

Hello,

(And no, I am not coming to complain about a lack of MDX Parser).

Rather, I'd like to get started on making an MDX parser over the next few months. I think consulting the creators of the markdown parser would be a good place to start - and I'd just like to ask: do you think this is feasible? If so, what projects should I start looking at (i.e. simple TS parsers, MDX parsers themselves, etc.)?

Looking at this markdown codebase, it seems like MDX would "just" be a superset of this parser, including:

  • frontmatter
  • component syntax
  • top-level export, import
  • JS inside braces

I (and others) would even be happy with just a subset of these being implemented.

Thanks for your response.

barrett-ruth avatar Feb 24 '24 14:02 barrett-ruth

This has been discussed earlier, specifically see this issue and comment: https://github.com/tree-sitter-grammars/tree-sitter-markdown/issues/81#issuecomment-1448052124

You don't need to look at parsers for TS or MDX I think, since the javascript in these files would best be parsed using language injection. Basically you don't need to worry about parsing javascript, you just need to able to properly mark places where there is javascript, i.e. export statements, curly braces.

You're right that MDX would probably best be implemented as a superset / fork of this parser rather than from ground up, since most of the syntax is markdown anyways.

The main difficulty in implementing this is then understanding the markdown parser with markdown being a very messy language. It's probably helpful to read through the entirety of this page first if you haven't done so yet.

MDeiml avatar Feb 25 '24 11:02 MDeiml

You don't need to look at parsers for TS or MDX I think, since the javascript in these files would best be parsed using language injection. Basically you don't need to worry about parsing javascript, you just need to able to properly mark places where there is javascript, i.e. export statements, curly braces.

Ok great, that's what I thought. I see someone attempted to do it themselves but obviously didn't complete the task.

In that thread, another individual said they'd prefer MDX as a separate parser -- and I happen to agree. MDX seems to be a completely different language and I don't think it should be implemented on top of ts-markdown (unless it becomes stable in a long time). Once I start this in a few months I think forking will be the best option, although I'm not sure how keeping it updated will work.

Also, thanks for your patience constantly dealing with people who don't know a lot when you seem to be an expert on the matter. Hopefully I can contribute something meaningful in a month or two (or three).

barrett-ruth avatar Feb 25 '24 14:02 barrett-ruth

Haha, I'm not an expert by far, but I appreciate the comment. Feel free to ask me any questions that come up.

MDeiml avatar Mar 02 '24 06:03 MDeiml

Although code highlighting within backtick blocks does indeed work when tagged as JS, its just not enough. There is very little support for good syntax higlighting within an MDX file, a lot of things like top level imports just don't get highlighted currently.

dan-myles avatar Apr 29 '24 20:04 dan-myles