grammars-v4 icon indicating copy to clipboard operation
grammars-v4 copied to clipboard

Request for creation of markdown grammar

Open Groostav opened this issue 8 years ago • 9 comments

Hey guys,

I'm an ANTLR regular and I'm looking to use ANTLR to parse markdown files. I'm willing to give the creation of a grammar myself a shot, but I was wondering if the gurus on this page perhaps already have one or if they can whip up a less-bug-prone grammar for me.

There is Antmark, but by the authors own admission its not a particularly elegant implementation.

Is anybody willing to create such a pull request?

With such a grammar I'm hoping to implement TomasMikula/RichTextFX#348

Groostav avatar Aug 22 '16 18:08 Groostav

https://news.ycombinator.com/item?id=8268470

teverett avatar Aug 29 '16 03:08 teverett

@Groostav the best markdown parser so far (not ANTLR based) seems to be https://github.com/atlassian/commonmark-java

aadrian avatar Sep 30 '16 14:09 aadrian

I need it too.

songtianyi avatar Aug 04 '17 01:08 songtianyi

Please give this a read

http://roopc.net/posts/2014/markdown-cfg/

teverett avatar Aug 04 '17 02:08 teverett

@teverett ok 😢 , so antlr grammar is not a better solution.

songtianyi avatar Aug 04 '17 03:08 songtianyi

I'm working on this grammar right now, I have ideas.

KvanTTT avatar Dec 20 '20 10:12 KvanTTT

I'm working on this grammar right now, I have ideas.

Any news?

Khaos66 avatar Dec 22 '22 15:12 Khaos66

Unfortunately, no. It's hard to create Markdown grammar since this language is quite context sensitive unlike ANTLR that is mostly context insensitive (without using semantics predicates).

KvanTTT avatar Dec 22 '22 19:12 KvanTTT



grammar Markdown;
// Parser rules

@ lexer :: members
{

    // The most recently produced token.
    private Token lastToken = null;
	private static String TAG = "Markdown.g4";

    /**
     * Return the next token from the character stream and records this last
     * token in case it resides on the default channel. This recorded token
     * is used to determine when the lexer could possibly match a regex
     * literal.
     *
     * @return the next token from the character stream.
     */
    @Override
    public Token nextToken() {

        // Get the next token.
        Token next = super.nextToken();

        if (next.getChannel() == Token.DEFAULT_CHANNEL) {
            // Keep track of the last token on the default channel.
            this.lastToken = next;
        }

        return next;
    }
	public static String getNameGrammer(){
	 return TAG;
	}
}
markdownFile
   : line+
   ;

line
   : heading
   | paragraph
   ;

heading
   : H1 TEXT_NEWLINE
   | H2 TEXT_NEWLINE
   | H3 TEXT_NEWLINE
   ;

paragraph
   : sentence+
   ;

sentence
   : TEXT
   | URL
   | BOLD
   | ITALIC
   | BOLDITALIC
   | LIST
   | CODELIST
   ;
   // Lexer rules
   
H1
   : '#'
   ;

H2
   : '##'
   ;

H3
   : '###'
   ;

TEXT_NEWLINE
   : ~ [\r\n]+
   ;

TEXT
   : ~ [^\r\n#*] [~\r\n]+
   ;

URL
   : '[' TEXT ']' '(' TEXT ')'
   ;

BOLD
   : '*' TEXT '*'
   ;

ITALIC
   : '_' TEXT '_'
   ;

BOLDITALIC
   : '**' TEXT '**'
   ;

LIST
   : ('*' | '-') TEXT_NEWLINE+
   ;

CODELIST
   : '```'
   ;

HTMLTEXT
   : '<' TEXT '>'
   | '<' TEXT '/>'
   ;

STRINGLATTER
   : '"' TEXT '"'
   ;


I wrote an example first, maybe it will help you

HanzoDev1375 avatar Jun 20 '24 11:06 HanzoDev1375