grammars-v4
grammars-v4 copied to clipboard
Request for creation of markdown grammar
Hey guys,
I'm an ANTLR regular and I'm looking to use ANTLR to parse markdown files. I'm willing to give the creation of a grammar myself a shot, but I was wondering if the gurus on this page perhaps already have one or if they can whip up a less-bug-prone grammar for me.
There is Antmark, but by the authors own admission its not a particularly elegant implementation.
Is anybody willing to create such a pull request?
With such a grammar I'm hoping to implement TomasMikula/RichTextFX#348
https://news.ycombinator.com/item?id=8268470
@Groostav the best markdown parser so far (not ANTLR based) seems to be https://github.com/atlassian/commonmark-java
I need it too.
Please give this a read
http://roopc.net/posts/2014/markdown-cfg/
@teverett ok 😢 , so antlr grammar is not a better solution.
I'm working on this grammar right now, I have ideas.
I'm working on this grammar right now, I have ideas.
Any news?
Unfortunately, no. It's hard to create Markdown grammar since this language is quite context sensitive unlike ANTLR that is mostly context insensitive (without using semantics predicates).
grammar Markdown;
// Parser rules
@ lexer :: members
{
// The most recently produced token.
private Token lastToken = null;
private static String TAG = "Markdown.g4";
/**
* Return the next token from the character stream and records this last
* token in case it resides on the default channel. This recorded token
* is used to determine when the lexer could possibly match a regex
* literal.
*
* @return the next token from the character stream.
*/
@Override
public Token nextToken() {
// Get the next token.
Token next = super.nextToken();
if (next.getChannel() == Token.DEFAULT_CHANNEL) {
// Keep track of the last token on the default channel.
this.lastToken = next;
}
return next;
}
public static String getNameGrammer(){
return TAG;
}
}
markdownFile
: line+
;
line
: heading
| paragraph
;
heading
: H1 TEXT_NEWLINE
| H2 TEXT_NEWLINE
| H3 TEXT_NEWLINE
;
paragraph
: sentence+
;
sentence
: TEXT
| URL
| BOLD
| ITALIC
| BOLDITALIC
| LIST
| CODELIST
;
// Lexer rules
H1
: '#'
;
H2
: '##'
;
H3
: '###'
;
TEXT_NEWLINE
: ~ [\r\n]+
;
TEXT
: ~ [^\r\n#*] [~\r\n]+
;
URL
: '[' TEXT ']' '(' TEXT ')'
;
BOLD
: '*' TEXT '*'
;
ITALIC
: '_' TEXT '_'
;
BOLDITALIC
: '**' TEXT '**'
;
LIST
: ('*' | '-') TEXT_NEWLINE+
;
CODELIST
: '```'
;
HTMLTEXT
: '<' TEXT '>'
| '<' TEXT '/>'
;
STRINGLATTER
: '"' TEXT '"'
;
I wrote an example first, maybe it will help you