grammars-v4 Request for creation of markdown grammar

Hey guys,

I'm an ANTLR regular and I'm looking to use ANTLR to parse markdown files. I'm willing to give the creation of a grammar myself a shot, but I was wondering if the gurus on this page perhaps already have one or if they can whip up a less-bug-prone grammar for me.

There is Antmark, but by the authors own admission its not a particularly elegant implementation.

Is anybody willing to create such a pull request?

With such a grammar I'm hoping to implement TomasMikula/RichTextFX#348

Aug 22 '16 18:08 Groostav

https://news.ycombinator.com/item?id=8268470

Aug 29 '16 03:08 teverett

@Groostav the best markdown parser so far (not ANTLR based) seems to be https://github.com/atlassian/commonmark-java

Sep 30 '16 14:09 aadrian

I need it too.

Aug 04 '17 01:08 songtianyi

Please give this a read

http://roopc.net/posts/2014/markdown-cfg/

Aug 04 '17 02:08 teverett

@teverett ok 😢 , so antlr grammar is not a better solution.

Aug 04 '17 03:08 songtianyi

I'm working on this grammar right now, I have ideas.

Dec 20 '20 10:12 KvanTTT

I'm working on this grammar right now, I have ideas.

Any news?

Dec 22 '22 15:12 Khaos66

Unfortunately, no. It's hard to create Markdown grammar since this language is quite context sensitive unlike ANTLR that is mostly context insensitive (without using semantics predicates).

Dec 22 '22 19:12 KvanTTT



grammar Markdown;
// Parser rules

@ lexer :: members
{

    // The most recently produced token.
    private Token lastToken = null;
	private static String TAG = "Markdown.g4";

    /**
     * Return the next token from the character stream and records this last
     * token in case it resides on the default channel. This recorded token
     * is used to determine when the lexer could possibly match a regex
     * literal.
     *
     * @return the next token from the character stream.
     */
    @Override
    public Token nextToken() {

        // Get the next token.
        Token next = super.nextToken();

        if (next.getChannel() == Token.DEFAULT_CHANNEL) {
            // Keep track of the last token on the default channel.
            this.lastToken = next;
        }

        return next;
    }
	public static String getNameGrammer(){
	 return TAG;
	}
}
markdownFile
   : line+
   ;

line
   : heading
   | paragraph
   ;

heading
   : H1 TEXT_NEWLINE
   | H2 TEXT_NEWLINE
   | H3 TEXT_NEWLINE
   ;

paragraph
   : sentence+
   ;

sentence
   : TEXT
   | URL
   | BOLD
   | ITALIC
   | BOLDITALIC
   | LIST
   | CODELIST
   ;
   // Lexer rules
   
H1
   : '#'
   ;

H2
   : '##'
   ;

H3
   : '###'
   ;

TEXT_NEWLINE
   : ~ [\r\n]+
   ;

TEXT
   : ~ [^\r\n#*] [~\r\n]+
   ;

URL
   : '[' TEXT ']' '(' TEXT ')'
   ;

BOLD
   : '*' TEXT '*'
   ;

ITALIC
   : '_' TEXT '_'
   ;

BOLDITALIC
   : '**' TEXT '**'
   ;

LIST
   : ('*' | '-') TEXT_NEWLINE+
   ;

CODELIST
   : '```'
   ;

HTMLTEXT
   : '<' TEXT '>'
   | '<' TEXT '/>'
   ;

STRINGLATTER
   : '"' TEXT '"'
   ;

I wrote an example first, maybe it will help you

Jun 20 '24 11:06 HanzoDev1375

grammars-v4 grammars-v4 copied to clipboard

Request for creation of markdown grammar

grammars-v4
grammars-v4 copied to clipboard