jison icon indicating copy to clipboard operation
jison copied to clipboard

Need a little help transitioning from flex/bison to jison-lex/jison

Open Ravenwater opened this issue 8 years ago • 4 comments

In the flex/bison world, you can write simple text processing utilities. For example, a wc program:

%{ /*

  • word count */

var nrchars, nrwords, nrlines;

%}

%%

\n ++nrchars, ++nrlines; [^ \t\n] ++nrwords, nrchars += yyleng; . ++nrchars;

%%

main() { yylex(); printf("%d\t%d\t%d\n", nrchars, nrwords, nrlines); }

---EOF

I have yet to discover how to write these types of lex/yacc tools with jison-lex/jison. Can somebody enlighten me, please?

Ravenwater avatar Aug 24 '16 16:08 Ravenwater

I wouldn't mind against good jison tutorials too. Or maybe even more human readable way to write rules.

techtonik avatar Aug 24 '16 19:08 techtonik

http://jison.org/

yosbelms avatar Aug 24 '16 19:08 yosbelms

http://jison.org/ is wonderful, but I couldn't find any documentation how to replace the C-style bindings to main() that make lex and yacc so productive for building text processing tools.

I do see that the third section of the syntax and grammar files get reproduced just prior to the export. I had hoped that others had used jison-lex/jison as a tool building automation, and thus could bootstrap me, but alas, I'll dive into the code and see if I can make it work.

Ravenwater avatar Aug 24 '16 22:08 Ravenwater

See StackOverflow answer

wordcount.jison

%lex
%options flex

%{
if (!('chars' in yy)) {
  yy.chars = 0;
  yy.words = 0;
  yy.lines = 1;
}
%}

%%
[^ \t\n\r\f\v]+ { yy.words++; yy.chars += yytext.length; }
. { yy.chars++; }
\r { yy.chars++; }
\n { yy.chars++; yy.lines++; }
/lex

%%
E : { console.log( yy.lines + "\t" + yy.words + "\t" + yy.chars); };

Earlier Answer _

Since I am just starting out with Jison and using flex & bison as a reference which has the word count example and ran into the same problem I am posting this to help others. This is not the best way to do it, but it does get one past this example and on to making more progress with Jison.

wordcount.jison

// wordcount.jison

// Based on the example in "flex & bison" by John Levine

// This is a wordcount example.

// Lexer Grammar

%lex

/* Lexer Section 1 : Definitions */


%{
    console.log("In Lexer Definitions section");
%}

%%

/* Lexer Section 2 : Rules */

[a-zA-Z]+
    {
        console.log("In Lexer Rule WORD");
        console.log("Matched: '" + this.match + "'");
        return 'WORD';
    }
\n
    {
        console.log("In Lexer Rule LF");
        console.log("Matched: line feed");
        return 'LF';
    }
\r
    {
        console.log("In Lexer Rule CR");
        console.log("Matched: carriage return");
        return 'CR';
    }
<<EOF>>
    {
        console.log("In Lexer Rule EOF");
        console.log("Matched: <<EOF>>");
        return 'EOF';
    }
.
    {
        console.log("In Lexer Rule SEP");
        console.log("Matched: '" + this.match + "'");
        return 'SEP';
    }


%%

/* Lexer Section 3 : User Code */

console.log("In Lexer User Code section");

/lex

// Parser Grammar

/* Parser Section 1 : Definitions */

%{
    /* code block */
    console.log("In Parser Definitions section");
    let myChars = 0;
    let myWords = 0;
    let myLines = 0;
%}

%%

/* Parser Section 2 : Rules */

input
    : sentences eof
    ;

sentences :
      sentence cr lf sentences
    | sentence
    ;

sentence :
      word sep sentence
    | word sep
    | word
    ;

word
    : WORD
        %{
            console.log("In Parser Rule WORD");
            myWords++; myChars += yytext.length;
        %}
    ;

cr  : CR
        %{
            console.log("In Parser Rule CR");
            myChars++;
        %}
    ;

lf  : LF
        %{
            console.log("In Parser Rule LF");
            myChars++; myLines++;
        %}
    ;

sep : SEP
        %{
            console.log("In Parser Rule SEP");
            myChars++;
        %}
    ;

eof : EOF
        %{
            console.log("In Parser Rule EOF");
            myChars++; myLines++;
            console.log("Lines: " + myLines + ", Words: "+ myWords + ", Chars: " + myChars);
        %}
    ;

%%

/* Parser Section 3 : Epilogue */

console.log("In Parser Epilogue section");

wordcount_input.txt

This is line one.
line two.

To build and run

My development environment consist of:

  • Microsoft Windows [Version 10.0.14393]
  • Visual Studio Code v1.5.3 (Visual Studio Code is not Visual Studio. It is a freeware and open source IDE by Microsoft that runs on Windows, Linux, and Mac).
  • Visual Studio Code extensions:
    • jshint - a linter for JavaScript
    • ESLint - The pluggable linting utility for JavaScript and JSX
    • beautify - Beautify code in place for VS Code
  • Node.js version v6.6.0
  • Jison 0.4.17 (Installed using Node Package Manager (npm))
>jison wordcount.jison
>node wordcount.js wordcount_input.txt

output

In Parser Definitions section
In Parser Epilogue section
In Lexer User Code section
In Lexer Definitions section
In Lexer Rule WORD
Matched: 'This'
In Lexer Definitions section
In Lexer Rule SEP
Matched: ' '
In Parser Rule WORD
In Lexer Definitions section
In Lexer Rule WORD
Matched: 'is'
In Parser Rule SEP
In Lexer Definitions section
In Lexer Rule SEP
Matched: ' '
In Parser Rule WORD
In Lexer Definitions section
In Lexer Rule WORD
Matched: 'line'
In Parser Rule SEP
In Lexer Definitions section
In Lexer Rule SEP
Matched: ' '
In Parser Rule WORD
In Lexer Definitions section
In Lexer Rule WORD
Matched: 'one'
In Parser Rule SEP
In Lexer Definitions section
In Lexer Rule SEP
Matched: '.'
In Parser Rule WORD
In Lexer Definitions section
In Lexer Rule CR
Matched: carriage return
In Parser Rule SEP
In Parser Rule CR
In Lexer Definitions section
In Lexer Rule LF
Matched: line feed
In Parser Rule LF
In Lexer Definitions section
In Lexer Rule WORD
Matched: 'line'
In Lexer Definitions section
In Lexer Rule SEP
Matched: ' '
In Parser Rule WORD
In Lexer Definitions section
In Lexer Rule WORD
Matched: 'two'
In Parser Rule SEP
In Lexer Definitions section
In Lexer Rule SEP
Matched: '.'
In Parser Rule WORD
In Lexer Definitions section
In Lexer Rule EOF
Matched: <<EOF>>
In Parser Rule SEP
In Parser Rule EOF
Lines: 2, Words: 6, Chars: 29

Notes:

  • Had to use parser because I am just learning Jison and could not figure out how to use just the lexer with Jison.
  • Need to explicitly add return in each lexer rule, e.g. return 'WORD';
  • Lexer definition section is run for each lexer rule; was expecting it to only run once before all rules. This was causing the counts to be reset with each rule. Easiest way around was to move code into parser definitions which are only run once at start.
  • Since Windows text files have CR/LF instead of just LF, had to adjust accordingly.
  • Used this.match in lexer because yytext is not available/was not working in lexer. Still learning.
  • Cannot call main function due to the way Jison runs generated code. Easiest way around was to put action into EOF parser rule.
  • Had to convert C to JavaScript, e.g. strlen(yytext) to yytext.length

To help me understand the sections of Jison, I liberally added lots of comments and code sections to see how the user code was getting inserted into the Jison boilerplate code. This helped out because I soon realized that leaving out return statements with the lexer actions was causing problems, and the counters were getting initialized with each lexer rule instead of just once. Also I had to use the parser because I am still learning and have not figured out how to get just flex to work in Jison.

Hope this helps you and others.

EricGT avatar Sep 28 '16 22:09 EricGT