zorex icon indicating copy to clipboard operation
zorex copied to clipboard

Any (rudimentary) documentation and/or examples?

Open dumblob opened this issue 4 years ago • 2 comments

I'm mostly interested in how the patterns look like and how the "merge with EBNF" is being (planned to be) realized.

dumblob avatar Nov 01 '21 21:11 dumblob

Hi, not yet, this is actually the part I am currently working on in this project. There is a lot to figure out here, and a lot of implications to various choices.

The general idea is that I'm starting with something like a traditional EBNF-like syntax:

Newline = "\r\n" | "\r" | "\n" ;
Space = " " | "\t" ;
Whitespace = Newline | Space ;
Assignment = "=" ;
Semicolon = ";" ;
Identifier = /[A-Z][[:alnum:]_]*/ ;
NestedPattern = "/", Pattern, "/" ;
Expr = NestedPattern | Identifier ;
ExprList = (ExprList, ",")? , Expr ;
Definition = Identifier , Whitespace+, Assignment, Whitespace+, ExprList, Semicolon ;
Grammar = (Definition | Expr | Whitespace+)+, EOF ;

You'll notice the above is pretty vanilla EBNF, except the definition of Identifier:

Identifier = /[A-Z][[:alnum:]_]*/ ;

As you can guess, the text between / /; is a regexp. There are some implications here (your regexp cannot contain literal unescaped /;\n at the least) which I'm still evaluating.

The other thought is that the above, on its own, is not actually a valid input. Instead, you must end an input with a main expression:

Newline = "\r\n" | "\r" | "\n" ;
Space = " " | "\t" ;
Whitespace = Newline | Space ;
Assignment = "=" ;
Semicolon = ";" ;
Identifier = /[A-Z][[:alnum:]_]*/ ;
NestedPattern = "/", Pattern, "/" ;
Expr = NestedPattern | Identifier ;
ExprList = (ExprList, ",")? , Expr ;
Definition = Identifier , Whitespace+, Assignment, Whitespace+, ExprList, Semicolon ;
Grammar = (Definition | Expr | Whitespace+)+, EOF ;
+Grammar;

The above saying Grammar is the main parser entrypoint expression, effectively.

Since it is an expression, a valid program would also be just:

/[A-Z][[:alnum:]_]*/ 

And here you start to get an idea of how one could start with regular expressions and begin to break those out into more EBNF-like definitions as your regexp gets more complex.

And of course, Zorex being built on a generalized LL parser actually means you're not restricted to regexp at all, but can devolve into parsing full left-and-right recursive context-free grammars as well as some context-sensitive ones.

However, this is all still very early stages, I haven't figured out a number of important things so this is more an experiment at this point.


I'm curious, do you have a use case for something like this? Are you looking for something that could do this, or just looking for a generic Zig regexp engine?

emidoots avatar Nov 07 '21 00:11 emidoots

Thanks for a thorough answer and the outlook!

I'm curious, do you have a use case for something like this? Are you looking for something that could do this, or just looking for a generic Zig regexp engine?

No, I don't have any use case per se for this. I'm mainly curious how you're going to balance the 3 main components: syntax restrictions of Zig, regex syntax requirements, LL parser requirements. If this could be generalized or partially reused, I'd be interested in SLOC to get a glimpse how much complexity & effort is needed for such a tool. And lastly, I'm also curious how the performance will look like in practice and what are the practical limitations.

So, all in all I'm interested in pretty much the whole idea (which I find kind of novel in the world of AOT compiled languages) and all its implications :wink:.

dumblob avatar Nov 07 '21 12:11 dumblob