jison
jison copied to clipboard
Need a little help transitioning from flex/bison to jison-lex/jison
In the flex/bison world, you can write simple text processing utilities. For example, a wc program:
%{ /*
- word count */
var nrchars, nrwords, nrlines;
%}
%%
\n ++nrchars, ++nrlines; [^ \t\n] ++nrwords, nrchars += yyleng; . ++nrchars;
%%
main() { yylex(); printf("%d\t%d\t%d\n", nrchars, nrwords, nrlines); }
---EOF
I have yet to discover how to write these types of lex/yacc tools with jison-lex/jison. Can somebody enlighten me, please?
I wouldn't mind against good jison tutorials too. Or maybe even more human readable way to write rules.
http://jison.org/
http://jison.org/ is wonderful, but I couldn't find any documentation how to replace the C-style bindings to main() that make lex and yacc so productive for building text processing tools.
I do see that the third section of the syntax and grammar files get reproduced just prior to the export. I had hoped that others had used jison-lex/jison as a tool building automation, and thus could bootstrap me, but alas, I'll dive into the code and see if I can make it work.
See StackOverflow answer
wordcount.jison
%lex
%options flex
%{
if (!('chars' in yy)) {
yy.chars = 0;
yy.words = 0;
yy.lines = 1;
}
%}
%%
[^ \t\n\r\f\v]+ { yy.words++; yy.chars += yytext.length; }
. { yy.chars++; }
\r { yy.chars++; }
\n { yy.chars++; yy.lines++; }
/lex
%%
E : { console.log( yy.lines + "\t" + yy.words + "\t" + yy.chars); };
Earlier Answer _
Since I am just starting out with Jison and using flex & bison as a reference which has the word count example and ran into the same problem I am posting this to help others. This is not the best way to do it, but it does get one past this example and on to making more progress with Jison.
wordcount.jison
// wordcount.jison
// Based on the example in "flex & bison" by John Levine
// This is a wordcount example.
// Lexer Grammar
%lex
/* Lexer Section 1 : Definitions */
%{
console.log("In Lexer Definitions section");
%}
%%
/* Lexer Section 2 : Rules */
[a-zA-Z]+
{
console.log("In Lexer Rule WORD");
console.log("Matched: '" + this.match + "'");
return 'WORD';
}
\n
{
console.log("In Lexer Rule LF");
console.log("Matched: line feed");
return 'LF';
}
\r
{
console.log("In Lexer Rule CR");
console.log("Matched: carriage return");
return 'CR';
}
<<EOF>>
{
console.log("In Lexer Rule EOF");
console.log("Matched: <<EOF>>");
return 'EOF';
}
.
{
console.log("In Lexer Rule SEP");
console.log("Matched: '" + this.match + "'");
return 'SEP';
}
%%
/* Lexer Section 3 : User Code */
console.log("In Lexer User Code section");
/lex
// Parser Grammar
/* Parser Section 1 : Definitions */
%{
/* code block */
console.log("In Parser Definitions section");
let myChars = 0;
let myWords = 0;
let myLines = 0;
%}
%%
/* Parser Section 2 : Rules */
input
: sentences eof
;
sentences :
sentence cr lf sentences
| sentence
;
sentence :
word sep sentence
| word sep
| word
;
word
: WORD
%{
console.log("In Parser Rule WORD");
myWords++; myChars += yytext.length;
%}
;
cr : CR
%{
console.log("In Parser Rule CR");
myChars++;
%}
;
lf : LF
%{
console.log("In Parser Rule LF");
myChars++; myLines++;
%}
;
sep : SEP
%{
console.log("In Parser Rule SEP");
myChars++;
%}
;
eof : EOF
%{
console.log("In Parser Rule EOF");
myChars++; myLines++;
console.log("Lines: " + myLines + ", Words: "+ myWords + ", Chars: " + myChars);
%}
;
%%
/* Parser Section 3 : Epilogue */
console.log("In Parser Epilogue section");
wordcount_input.txt
This is line one.
line two.
To build and run
My development environment consist of:
- Microsoft Windows [Version 10.0.14393]
- Visual Studio Code v1.5.3 (Visual Studio Code is not Visual Studio. It is a freeware and open source IDE by Microsoft that runs on Windows, Linux, and Mac).
- Visual Studio Code extensions:
- Node.js version v6.6.0
- Jison 0.4.17 (Installed using Node Package Manager (npm))
>jison wordcount.jison
>node wordcount.js wordcount_input.txt
output
In Parser Definitions section
In Parser Epilogue section
In Lexer User Code section
In Lexer Definitions section
In Lexer Rule WORD
Matched: 'This'
In Lexer Definitions section
In Lexer Rule SEP
Matched: ' '
In Parser Rule WORD
In Lexer Definitions section
In Lexer Rule WORD
Matched: 'is'
In Parser Rule SEP
In Lexer Definitions section
In Lexer Rule SEP
Matched: ' '
In Parser Rule WORD
In Lexer Definitions section
In Lexer Rule WORD
Matched: 'line'
In Parser Rule SEP
In Lexer Definitions section
In Lexer Rule SEP
Matched: ' '
In Parser Rule WORD
In Lexer Definitions section
In Lexer Rule WORD
Matched: 'one'
In Parser Rule SEP
In Lexer Definitions section
In Lexer Rule SEP
Matched: '.'
In Parser Rule WORD
In Lexer Definitions section
In Lexer Rule CR
Matched: carriage return
In Parser Rule SEP
In Parser Rule CR
In Lexer Definitions section
In Lexer Rule LF
Matched: line feed
In Parser Rule LF
In Lexer Definitions section
In Lexer Rule WORD
Matched: 'line'
In Lexer Definitions section
In Lexer Rule SEP
Matched: ' '
In Parser Rule WORD
In Lexer Definitions section
In Lexer Rule WORD
Matched: 'two'
In Parser Rule SEP
In Lexer Definitions section
In Lexer Rule SEP
Matched: '.'
In Parser Rule WORD
In Lexer Definitions section
In Lexer Rule EOF
Matched: <<EOF>>
In Parser Rule SEP
In Parser Rule EOF
Lines: 2, Words: 6, Chars: 29
Notes:
- Had to use parser because I am just learning Jison and could not figure out how to use just the lexer with Jison.
- Need to explicitly add
return
in each lexer rule, e.g.return 'WORD';
- Lexer definition section is run for each lexer rule; was expecting it to only run once before all rules. This was causing the counts to be reset with each rule. Easiest way around was to move code into parser definitions which are only run once at start.
- Since Windows text files have
CR/LF
instead of justLF
, had to adjust accordingly. - Used
this.match
in lexer becauseyytext
is not available/was not working in lexer. Still learning. - Cannot call
main
function due to the way Jison runs generated code. Easiest way around was to put action into EOF parser rule. - Had to convert
C
toJavaScript
, e.g.strlen(yytext)
toyytext.length
To help me understand the sections of Jison, I liberally added lots of comments and code sections to see how the user code was getting inserted into the Jison boilerplate code. This helped out because I soon realized that leaving out return
statements with the lexer actions was causing problems, and the counters were getting initialized with each lexer rule instead of just once. Also I had to use the parser because I am still learning and have not figured out how to get just flex
to work in Jison.
Hope this helps you and others.