fut icon indicating copy to clipboard operation
fut copied to clipboard

Grammar railroad diagram

Open mingodad opened this issue 3 years ago • 2 comments

After going through the CiParser.cs and writing an LL(1) grammar with a modified https://github.com/mingodad/CocoR-CSharp I've got it to generate an EBNF understood by https://www.bottlecaps.de/rr/ui to generate railroad diagram (https://en.wikipedia.org/wiki/Syntax_diagram).

Copy the EBNF shown bellow and paste on https://www.bottlecaps.de/rr/ui in the tab Edit Grammar then switch to the tab View Diagram :

//
// EBNF generated by CocoR parser generator to be viewed with https://www.bottlecaps.de/rr/ui
//

//
// productions
//

Cito ::=  stmt* EOF
stmt ::=  ParseDoc? Public? ( ParseClass | ParseEnum | ParseNative )
ParseDoc ::=  DocComment ParseCodeDoc
ParseClass ::=  ( Static | Abstract | Sealed )? Class ParseId ( Colon ParseId )? LeftBrace ( ParseDoc? CiVisibility? ( ParseConst | ParseCallType? ( Void ParseClassMember | ParseType ( ParseBlock | ParseClassMember ) ) ) )* RightBrace
ParseEnum ::=  Enum Asterisk? ParseId LeftBrace EnumElement ( Comma EnumElement )* RightBrace
ParseNative ::=  Native LeftBrace RightBrace
ParseCodeDoc ::=  ParsePara ( Dot ParseBlock* )?
ParsePara ::=  ParseText ( CodeDelimiter ParseText )*
ParseBlock ::=  LeftBrace ParseStatement* RightBrace
ParseText ::=  ANY
ParseId ::=  ident
CiVisibility ::=  Internal | Protected | Public | Private
ParseConst ::=  Const ParseType ParseId Assign ParseConstInitializer Semicolon
ParseCallType ::=  Static | Abstract | Virtual | Override | Sealed
ParseClassMember ::=  ParseId ( ParseMethod | ( Assign ParseExpr )? Semicolon )
ParseType ::=  ParseExpr ( Range ParseExpr )?
ParseMethod ::=  ExclamationMark? LeftParenthesis ( MethodParameter ( Comma MethodParameter )* )? RightParenthesis Throws? ( Semicolon | FatArrow ParseReturnExpr | ParseBlock )
ParseExpr ::=  ParseCondOrExpr ( QuestionMark ParseExpr Colon ParseExpr )?
ParseConstInitializer ::=  LeftBrace ParseCollection? RightBrace | ParseExpr
ParseCollection ::=  ParseExpr ( Comma ParseExpr )*
MethodParameter ::=  ParseDoc? ParseVarType
ParseReturnExpr ::=  ParseExpr? Semicolon
ParseVarType ::=  ParseType ParseVarAssign
ParseVarAssign ::=  ParseId ( Assign ParseAssign )?
ParseAssign ::=  ParseType ( ( Assign | AddAssign | SubAssign | MulAssign | DivAssign | ModAssign | AndAssign | OrAssign | XorAssign | ShiftLeftAssign | ShiftRightAssign ) ParseAssign | ParseVarAssign )?
EnumElement ::=  ParseDoc? ParseId ( Assign ParseExpr )?
ParseStatement ::=  ParseBlock | ParseAssert | ParseBreak | ParseConst | ParseContinue | ParseDoWhile | ParseFor | ParseForeach | ParseIf | ParseLock | ParseNative | ParseReturn | ParseSwitch | ParseThrow | ParseWhile | ParseAssign Semicolon
ParseAssert ::=  Assert ParseExpr ( Comma ParseExpr )? Semicolon
ParseBreak ::=  Break Semicolon
ParseContinue ::=  Continue Semicolon
ParseDoWhile ::=  Do ParseLoopBody While ParseParenthesized Semicolon
ParseFor ::=  For LeftParenthesis ParseAssign? Semicolon ParseExpr? Semicolon ParseAssign? RightParenthesis ParseLoopBody
ParseForeach ::=  Foreach LeftParenthesis ( LeftParenthesis ParseForeachIterator Comma ParseForeachIterator RightParenthesis | ParseForeachIterator ) In ParseExpr RightParenthesis ParseLoopBody
ParseIf ::=  If ParseParenthesized ParseStatement ( Else ParseStatement )?
ParseLock ::=  Lock ParseParenthesized ParseStatement
ParseReturn ::=  Return ParseReturnExpr
ParseSwitch ::=  Switch ParseParenthesized LeftBrace ParseCaseStatement* ( Default Colon ParseStatement* )? RightBrace
ParseThrow ::=  Throw ParseExpr Semicolon
ParseWhile ::=  While ParseParenthesized ParseLoopBody
ParseLoopBody ::=  ParseStatement
ParseParenthesized ::=  LeftParenthesis ParseExpr RightParenthesis
ParseForeachIterator ::=  ParseType ParseId
ParseCaseStatement ::=  ParseCase ParseCase* ParseStatement*
ParseCase ::=  Case ParseExpr Colon
ParseCondOrExpr ::=  ParseCondAndExpr ( CondOr ParseCondAndExpr )*
ParseCondAndExpr ::=  ParseOrExpr ( CondAnd ParseOrExpr )*
ParseOrExpr ::=  ParseXorExpr ( Or ParseXorExpr )*
ParseXorExpr ::=  ParseAndExpr ( Xor ParseAndExpr )*
ParseAndExpr ::=  ParseEqualityExpr ( And ParseEqualityExpr )*
ParseEqualityExpr ::=  ParseRelExpr ( ( Equal | NotEqual ) ParseRelExpr )*
ParseRelExpr ::=  ParseShiftExpr ( ( Less | LessOrEqual | Greater | GreaterOrEqual ) ParseShiftExpr | Is ParsePrimaryExpr ParseId? )*
ParseShiftExpr ::=  ParseAddExpr ( ( ShiftLeft | ShiftRight ) ParseAddExpr )*
ParsePrimaryExpr ::=  ( Increment | Decrement | Minus | Tilde | ExclamationMark | New ) ParsePrimaryExpr | ( Literal | ParseInterpolatedString | LeftParenthesis ParseType RightParenthesis | ParseSymbolReference | ParseListType | ParseDictionaryType | Resource Less BYTE LeftBracket RightBracket Greater ParseParenthesized ) ( Dot ParseSymbolReference | LeftParenthesis ParseCollection? RightParenthesis | LeftBracket ParseExpr? RightBracket | ( Increment | Decrement | ExclamationMark | Hash ) )*
ParseAddExpr ::=  ParseMulExpr ( ( Plus | Minus ) ParseMulExpr )*
ParseMulExpr ::=  ParsePrimaryExpr ( ( Asterisk | Slash | Mod ) ParsePrimaryExpr )*
Literal ::=  intcon | floatcon | string | charcon | BoxedFalse | BoxedTrue | BoxedNull
ParseInterpolatedString ::=  interpolatedstring
ParseSymbolReference ::=  ParseId
ParseListType ::=  List Less ParseTypeTemplate Greater
ParseDictionaryType ::=  ( Dictionary | SortedDictionary ) Less ParseTypeTemplate Comma ParseTypeTemplate Greater
ParseTypeTemplate ::=  ParsePrimaryExpr

//
// tokens
//

Abstract ::= "abstract"
AddAssign ::= "+="
And ::= "&"
AndAssign ::= "&="
Assert ::= "assert"
Assign ::= "="
Asterisk ::= "*"
Break ::= "break"
Case ::= "case"
Class ::= "class"
CodeDelimiter ::= "`"
Colon ::= ":"
Comma ::= ","
CondAnd ::= "&&"
CondOr ::= "||"
Const ::= "const"
Continue ::= "continue"
Decrement ::= "--"
Default ::= "default"
Dictionary ::= "Dictionary"
DivAssign ::= "/="
DocComment ::= "///"
Do ::= "do"
Dot ::= "."
Else ::= "else"
Enum ::= "enum"
Equal ::= "=="
ExclamationMark ::= "!"
FatArrow ::= "=>"
Foreach ::= "foreach"
For ::= "for"
Greater ::= ">"
GreaterOrEqual ::= ">="
Hash ::= "#"
If ::= "if"
Increment ::= "++"
In ::= "in"
Internal ::= "internal"
Is ::= "is"
LeftBrace ::= "{"
LeftBracket ::= "["
LeftParenthesis ::= "("
Less ::= "<"
LessOrEqual ::= "<="
List ::= "List"
Lock ::= "lock"
Minus ::= "-"
Mod ::= "%"
ModAssign ::= "%="
MulAssign ::= "*="
Native ::= "native"
New ::= "new"
NotEqual ::= "!="
Or ::= "|"
OrAssign ::= "|="
Override ::= "override"
Plus ::= "+"
Protected ::= "protected"
Public ::= "public"
QuestionMark ::= "?"
Range ::= ".."
Resource ::= "resource"
Return ::= "return"
RightBrace ::= "}"
RightBracket ::= "]"
RightParenthesis ::= ")"
Sealed ::= "sealed"
Semicolon ::= ";"
ShiftLeft ::= "<<"
ShiftLeftAssign ::= "<<="
ShiftRight ::= ">>"
ShiftRightAssign ::= ">>="
Slash ::= "/"
SortedDictionary ::= "SortedDictionary"
Static ::= "static"
SubAssign ::= "-="
Switch ::= "switch"
Throw ::= "throw"
Throws ::= "throws"
Tilde ::= "~"
Virtual ::= "virtual"
Void ::= "void"
While ::= "while"
Xor ::= "^"
XorAssign ::= "^="
BoxedFalse ::= "false"
BoxedTrue ::= "true"
BoxedNull ::= "null"
BYTE ::= "byte"
Private ::= "private"

This grammar deviates a bit from the code flow on CiParser.cs but I think is good enough to give a global overview of Cito grammar actually it recognizes almost all of test/*.ci:


COMPILER Cito

TERMINALS
	T_SYMBOL

CHARACTERS
	letter    = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_".
	oct        = '0'..'7'.
	digit     = "0123456789".
	nzdigit    = '1'..'9'.
	cr        = '\r'.
	lf        = '\n'.
	tab       = '\t'.
	stringCh  = ANY - '"' - '\\' - cr - lf.
	charCh    = ANY - '\'' - '\\' - cr - lf.
	printable = '\u0020' .. '\u007e'.
	hex       = "0123456789abcdefABCDEF".

	newLine   = cr + lf.
	notNewLine = ANY - newLine .
	ws         = " " + tab + '\u000b' + '\u000c'.

TOKENS
	ident     = letter { letter | digit }.
	floatcon = ( '.' digit {digit} [('e'|'E')  ['+'|'-'] digit {digit}]
		| digit {digit} '.' {digit} [('e'|'E')  ['+'|'-'] digit {digit}]
		| digit {digit} ('e'|'E')  ['+'|'-'] digit {digit}
		) ['f'|'l'|'F'|'L'].

	intcon   = ( nzdigit {digit}
		| '0' {oct}
		| ("0x"|"0X") hex {hex}
		) {'u'|'U'|'l'|'L'}.

	string    = '"' { stringCh | '\\' printable } '"'.
	interpolatedstring    = "$\"" { stringCh | '\\' printable } '"'.
	badString = '"' { stringCh | '\\' printable } (cr | lf).
	charcon      = '\'' ( charCh | '\\' printable { hex } ) '\''.

	Abstract = "abstract" .
	AddAssign = "+=" .
	And = "&" .
	AndAssign = "&=" .
	Assert = "assert" .
	Assign = "=" .
	Asterisk = "*" .
	Break = "break" .
	Case = "case" .
	Class = "class" .
	CodeDelimiter = "`" .
	Colon = ":" .
	Comma = "," .
	CondAnd = "&&" .
	CondOr = "||" .
	Const = "const" .
	Continue = "continue" .
	Decrement = "--" .
	Default = "default" .
	Dictionary = "Dictionary" .
	DivAssign = "/=".
	DocComment = "///" .
	Do = "do" .
	Dot = "." .
	Else = "else" .
	Enum = "enum" .
	Equal = "==" .
	ExclamationMark = "!" .
	FatArrow = "=>" .
	Foreach = "foreach" .
	For = "for" .
	Greater = ">" .
	GreaterOrEqual = ">=" .
	Hash = "#" .
	If = "if" .
	Increment = "++" .
	In = "in" .
	Internal = "internal" .
	Is = "is" .
	LeftBrace = "{" .
	LeftBracket = "[" .
	LeftParenthesis = "(" .
	Less = "<" .
	LessOrEqual = "<=" .
	List = "List" .
	Lock = "lock" .
	Minus = "-" .
	Mod = "%" .
	ModAssign = "%=" .
	MulAssign = "*=" .
	Native = "native" .
	New = "new" .
	NotEqual = "!=" .
	Or = "|" .
	OrAssign = "|=" .
	Override = "override" .
	Plus = "+" .
	Protected = "protected" .
	Public = "public" .
	QuestionMark = "?" .
	Range = ".." .
	Resource = "resource" .
	Return = "return" .
	//RightAngle = ">" .
	RightBrace = "}" .
	RightBracket = "]" .
	RightParenthesis = ")" .
	Sealed = "sealed" .
	Semicolon  = ";" .
	ShiftLeft = "<<" .
	ShiftLeftAssign = "<<=" .
	ShiftRight = ">>" .
	ShiftRightAssign = ">>=" .
	Slash = "/" .
	SortedDictionary = "SortedDictionary" .
	Static = "static" .
	SubAssign = "-=" .
	Switch = "switch" .
	Throw = "throw" .
	Throws = "throws" .
	Tilde = "~" .
	Virtual = "virtual" .
	Void = "void" .
	While = "while" .
	Xor = "^" .
	XorAssign = "^=" .

	BoxedFalse = "false" .
	BoxedTrue = "true" .
	BoxedNull = "null" .

	BYTE : ident = "byte" .
	Private : ident = "private" .

PRAGMAS
	PreIf = "#if" {notNewLine} newLine.
	PreElIf = "#elif" {notNewLine} newLine.
	PreElse = "#else" {notNewLine} newLine.
	PreEndIf = "#endif" {notNewLine} newLine.

	COMMENTS FROM "/*" TO "*/" NESTED
	COMMENTS FROM "//" TO lf

IGNORE cr + lf + tab

/*-------------------------------------------------------------------------*/

PRODUCTIONS

Cito =
	{stmt}
	EOF
	.

stmt =
	[ParseDoc]
	["public"] (
		ParseClass
		| ParseEnum
		| ParseNative
	)
	.

ParseDoc =
	DocComment ParseCodeDoc
	.

ParseCodeDoc =
	ParsePara ['.' {ParseBlock}]
	.

ParsePara =
	ParseText {CodeDelimiter ParseText}
	.

ParseClass =
	[Static | Abstract | Sealed]
		Class ParseId [Colon ParseId]
		LeftBrace
		{
			[ParseDoc]
			[CiVisibility] (
				ParseConst
				| [ParseCallType] (
					Void ParseClassMember
					| ParseType (
						ParseBlock //Constructor
						| ParseClassMember
						)
					)
			)
		}
		RightBrace
	.

ParseClassMember =
	ParseId (
		ParseMethod
		| [Assign ParseExpr] Semicolon
	)
	.

CiVisibility =
	Internal
	| Protected
	| Public
	| Private
	.

ParseConst =
	Const ParseType ParseId Assign ParseConstInitializer Semicolon
	.

ParseConstInitializer =
	LeftBrace [ParseCollection] RightBrace
	| ParseExpr
	.

ParseType =
	ParseExpr [Range ParseExpr]
	.

ParseCallType =
	Static
	| Abstract
	| Virtual
	| Override
	| Sealed
	.

ParseMethod =
	[ExclamationMark]
		LeftParenthesis
			[MethodParameter { Comma MethodParameter }]
		RightParenthesis [Throws]
		(
			Semicolon
			| FatArrow ParseReturnExpr
			| ParseBlock
		)
	.

MethodParameter =
	[ParseDoc] ParseVarType
	.

ParseVarType =
	ParseType ParseVarAssign
	.

ParseVarAssign =
	ParseId [Assign ParseAssign]
	.

ParseAssign =
	ParseType [
		(
			Assign
			| AddAssign
			| SubAssign
			| MulAssign
			| DivAssign
			| ModAssign
			| AndAssign
			| OrAssign
			| XorAssign
			| ShiftLeftAssign
			| ShiftRightAssign
		) ParseAssign
		| ParseVarAssign
	]
	.

ParseEnum =
	Enum [Asterisk] ParseId LeftBrace EnumElement {',' EnumElement} RightBrace
	.

EnumElement =
	[ParseDoc] ParseId [Assign ParseExpr]
	.

ParseBlock =
	LeftBrace {ParseStatement} RightBrace
	.

ParseStatement =
	ParseBlock
	| ParseAssert
	| ParseBreak
	| ParseConst
	| ParseContinue
	| ParseDoWhile
	| ParseFor
	| ParseForeach
	| ParseIf
	| ParseLock
	| ParseNative
	| ParseReturn
	| ParseSwitch
	| ParseThrow
	| ParseWhile
	| ParseAssign Semicolon
	.

ParseAssert =
	Assert ParseExpr [Comma ParseExpr] Semicolon
	.

ParseBreak =
	Break Semicolon
	.

ParseContinue =
	Continue Semicolon
	.

ParseLoopBody =
	ParseStatement
	.

ParseDoWhile =
	Do ParseLoopBody While ParseParenthesized Semicolon
	.

ParseFor =
	For LeftParenthesis
			[ParseAssign] Semicolon [ParseExpr] Semicolon [ParseAssign]
		RightParenthesis ParseLoopBody
	.

ParseForeachIterator =
	ParseType ParseId
	.

ParseForeach =
	Foreach LeftParenthesis (
		LeftParenthesis ParseForeachIterator Comma ParseForeachIterator RightParenthesis
		| ParseForeachIterator
		) In ParseExpr RightParenthesis ParseLoopBody
	.

ParseIf =
	If ParseParenthesized ParseStatement [Else ParseStatement]
	.

ParseLock =
	Lock ParseParenthesized ParseStatement
	.

ParseNative =
	Native LeftBrace (. SkipNested(_LeftBrace, _RightBrace); .) RightBrace
	.

ParseReturn =
	Return ParseReturnExpr
	.

ParseReturnExpr =
	[ParseExpr] Semicolon
	.

ParseSwitch =
	Switch ParseParenthesized LeftBrace {ParseCaseStatement} [Default Colon {ParseStatement}] RightBrace
	.

ParseCaseStatement =
	ParseCase {ParseCase} {ParseStatement}
	.

ParseCase =
	Case ParseExpr Colon
	.

ParseThrow =
	Throw ParseExpr Semicolon
	.

ParseWhile =
	While ParseParenthesized ParseLoopBody
	.

ParseParenthesized =
	LeftParenthesis ParseExpr RightParenthesis
	.

ParseExpr =
	ParseCondOrExpr [QuestionMark ParseExpr Colon ParseExpr]
	.

ParseCondOrExpr =
	ParseCondAndExpr {CondOr ParseCondAndExpr }
	.

ParseCondAndExpr =
	ParseOrExpr {CondAnd ParseOrExpr}
	.

ParseOrExpr =
	ParseXorExpr {Or ParseXorExpr}
	.

ParseXorExpr =
	ParseAndExpr {Xor ParseAndExpr}
	.

ParseAndExpr =
	ParseEqualityExpr {And ParseEqualityExpr}
	.

ParseEqualityExpr =
	ParseRelExpr {(Equal | NotEqual) ParseRelExpr}
	.

ParseRelExpr =
	ParseShiftExpr {
		(Less | LessOrEqual | Greater | GreaterOrEqual) ParseShiftExpr
		| Is ParsePrimaryExpr [ParseId]
		}
	.

ParseShiftExpr =
	ParseAddExpr {(ShiftLeft | ShiftRight) ParseAddExpr}
	.

ParseAddExpr =
	ParseMulExpr {(Plus | Minus) ParseMulExpr}
	.

ParseMulExpr =
	ParsePrimaryExpr {(Asterisk | Slash | Mod) ParsePrimaryExpr}
	.

ParsePrimaryExpr =
	(Increment | Decrement | Minus | Tilde | ExclamationMark | New) ParsePrimaryExpr
	| (
		Literal
		| ParseInterpolatedString
		| LeftParenthesis ParseType RightParenthesis
		| ParseSymbolReference
		| ParseListType
		| ParseDictionaryType
		| Resource Less BYTE LeftBracket RightBracket Greater ParseParenthesized
	) {
		Dot ParseSymbolReference
		| LeftParenthesis [ParseCollection] RightParenthesis
		| LeftBracket [ParseExpr] RightBracket
		| (Increment | Decrement | ExclamationMark | Hash)
	}
	.

/*
CheckXcrementParent =
	(Increment | Decrement) ParsePrimaryExpr
	.
*/
ParseInterpolatedString =
	interpolatedstring //InterpolatedString ParseExpr [Comma ParseExpr] [Colon Number] "\"$"//RightBrace
	.

ParseDictionaryType =
	(Dictionary | SortedDictionary) Less ParseTypeTemplate Comma ParseTypeTemplate Greater
	.

ParseListType =
	List Less ParseTypeTemplate Greater
	.

ParseTypeTemplate =
	ParsePrimaryExpr
	.

ParseSymbolReference =
	ParseId
	.

ParseId =
	ident
	.

ParseCollection =
	ParseExpr {Comma ParseExpr}
	.

ParseText =
	ANY (. SkipTillEOL(); .)
	.

Literal =
	intcon
	| floatcon
	| string
	| charcon
	| BoxedFalse
	| BoxedTrue
	| BoxedNull
	.

END Cito.

mingodad avatar Dec 27 '21 12:12 mingodad

Fundamental question: is Ć grammar LL(1) or LL(k) or LL(*) ? Which tools is used to compile second form of this grammar?

andr1972 avatar Mar 13 '22 04:03 andr1972

is Ć grammar LL(1) or LL(k) or LL(*)

It's not LL(k), as can be seen in these examples:

C[2][3][n * 2] a;   // three-dimensional array storage local variable

a[2][3][i * 2] = 5; // array element assignment

The number of array dimensions isn't limited by the language. Also, the expressions in brackets can be arbitrarily long. If the brackets are followed by an identifier, it's a definition. Otherwise, it's an expression. I think that makes it LL(*).

CiParser.cs is a recursive-descent LL(1) parser. It handles the above case by outputting a temporary form for the variable definitions: the type is specified as an expression that is translated into proper type in CiResolver.ToType. That means that CiParser doesn't catch all the syntax errors, for example:

a[] = 5; // syntax error

These are checked in CiResolver.cs.

pfusik avatar Mar 13 '22 07:03 pfusik