tree-sitter-markdown
tree-sitter-markdown copied to clipboard
MDX Parser
Hello,
(And no, I am not coming to complain about a lack of MDX Parser).
Rather, I'd like to get started on making an MDX parser over the next few months. I think consulting the creators of the markdown parser would be a good place to start - and I'd just like to ask: do you think this is feasible? If so, what projects should I start looking at (i.e. simple TS parsers, MDX parsers themselves, etc.)?
Looking at this markdown codebase, it seems like MDX would "just" be a superset of this parser, including:
- frontmatter
- component syntax
- top-level export, import
- JS inside braces
I (and others) would even be happy with just a subset of these being implemented.
Thanks for your response.
This has been discussed earlier, specifically see this issue and comment: https://github.com/tree-sitter-grammars/tree-sitter-markdown/issues/81#issuecomment-1448052124
You don't need to look at parsers for TS or MDX I think, since the javascript in these files would best be parsed using language injection. Basically you don't need to worry about parsing javascript, you just need to able to properly mark places where there is javascript, i.e. export statements, curly braces.
You're right that MDX would probably best be implemented as a superset / fork of this parser rather than from ground up, since most of the syntax is markdown anyways.
The main difficulty in implementing this is then understanding the markdown parser with markdown being a very messy language. It's probably helpful to read through the entirety of this page first if you haven't done so yet.
You don't need to look at parsers for TS or MDX I think, since the javascript in these files would best be parsed using language injection. Basically you don't need to worry about parsing javascript, you just need to able to properly mark places where there is javascript, i.e. export statements, curly braces.
Ok great, that's what I thought. I see someone attempted to do it themselves but obviously didn't complete the task.
In that thread, another individual said they'd prefer MDX as a separate parser -- and I happen to agree. MDX seems to be a completely different language and I don't think it should be implemented on top of ts-markdown (unless it becomes stable in a long time). Once I start this in a few months I think forking will be the best option, although I'm not sure how keeping it updated will work.
Also, thanks for your patience constantly dealing with people who don't know a lot when you seem to be an expert on the matter. Hopefully I can contribute something meaningful in a month or two (or three).
Haha, I'm not an expert by far, but I appreciate the comment. Feel free to ask me any questions that come up.
Although code highlighting within backtick blocks does indeed work when tagged as JS, its just not enough. There is very little support for good syntax higlighting within an MDX file, a lot of things like top level imports just don't get highlighted currently.