noname icon indicating copy to clipboard operation
noname copied to clipboard

Feat: Create grammar file in BNF format

Open vuvoth opened this issue 1 year ago • 2 comments

Motivation

We are using handwritten parse in a noname project. We should create specs about grammar in BNF form. This helps us to agree with current grammar and makes it easier to follow develop and maintain syntax.

More context

  • https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form

vuvoth avatar Jun 17 '24 11:06 vuvoth

I learned about https://pest.rs/ today

mimoo avatar Jun 24 '24 22:06 mimoo

I also asked bard to produce a grammar file considering an example X) maybe that's a good way

WHITESPACE = _{ " " | "\t" | "\n" | "\r" }

// Constants and basic types
identifier = @{ ASCII_ALPHA ~ (ASCII_ALPHANUMERIC | "_")* }
number = @{ ASCII_DIGIT+ }

// Keywords
KW_CONST = _{ "const" }
KW_STRUCT = _{ "struct" }
KW_FN = _{ "fn" }
KW_FOR = _{ "for" }
KW_IN = _{ "in" }
KW_LET = _{ "let" }
KW_MUT = _{ "mut" }
KW_ASSERT = _{ "assert" }
KW_RETURN = _{ "return" }
KW_MAIN = _{ "main" }
KW_PUB = _{ "pub" }

// Punctuation and operators
COLON = _{ ":" }
SEMICOLON = _{ ";" }
EQUALS = _{ "=" }
DOT = _{ "." }
COMMA = _{ "," }
L_PAREN = _{ "(" }
R_PAREN = _{ ")" }
L_BRACE = _{ "{" }
R_BRACE = _{ "}" }
L_BRACKET = _{ "[" }
R_BRACKET = _{ "]" }
PLUS = _{ "+" }
MINUS = _{ "-" }
ASTERISK = _{ "*" }
DOUBLE_AMPERSAND = _{ "&&" }
DOUBLE_PIPE = _{ "||" }

// Literals and constants
empty = _{ "0" }
player1 = _{ "1" }
player2 = _{ "2" }
sudoku_size = _{ "81" }


// Structure definition
struct_def = {
    KW_STRUCT ~ identifier ~ L_BRACE
        inner_field ~ // Assuming only one field for now
    R_BRACE
}

inner_field = {
    identifier ~ COLON ~ L_BRACKET ~ identifier ~ SEMICOLON ~ number ~ R_BRACKET ~ COMMA
}

// Function definition
fn_def = {
    KW_FN ~ identifier ~ DOT ~ identifier ~ L_PAREN ~ self_arg ~ fn_args ~ R_PAREN ~ optional_return_type ~ block
}

self_arg = {
    identifier ~ COLON ~ identifier
}

fn_args = {
    (identifier ~ COLON ~ identifier ~ COMMA)* ~ identifier ~ COLON ~ identifier
}

optional_return_type = {
    (MINUS ~ ">" ~ identifier)?
}

// Function call
fn_call = {
    identifier ~ DOT ~ identifier ~ L_PAREN ~ fn_call_args ~ R_PAREN
}

fn_call_args = {
    (identifier ~ COMMA)* ~ identifier
}


// Block (function body)
block = { L_BRACE ~ statements ~ R_BRACE }

statements = { statement* }

statement = _{
    const_decl
  | struct_def
  | fn_def
  | for_loop
  | let_decl
  | assignment
  | assert_stmt
  | return_stmt
  | fn_call
  | main_fn
}

const_decl = { KW_CONST ~ identifier ~ EQUALS ~ number ~ SEMICOLON }

for_loop = {
    KW_FOR ~ identifier ~ KW_IN ~ number ~ DOT ~ DOT ~ number ~ block
}

let_decl = { KW_LET ~ (KW_MUT)? ~ identifier ~ EQUALS ~ expr ~ SEMICOLON }

assignment = { identifier ~ EQUALS ~ expr ~ SEMICOLON }

assert_stmt = { KW_ASSERT ~ L_PAREN ~ expr ~ R_PAREN ~ SEMICOLON }

return_stmt = { KW_RETURN ~ expr ~ SEMICOLON }

main_fn = { KW_MAIN ~ L_PAREN ~ KW_PUB ~ identifier ~ COLON ~ identifier ~ COMMA ~ identifier ~ COLON ~ identifier ~ R_PAREN ~ block }

expr = _{
    identifier
  | number
  | fn_call
  | expr ~ (PLUS | MINUS | ASTERISK) ~ expr
  | expr ~ (DOUBLE_AMPERSAND | DOUBLE_PIPE) ~ expr
  | L_PAREN ~ expr ~ R_PAREN
}

mimoo avatar Jun 24 '24 22:06 mimoo