flex
flex copied to clipboard
What happens at the end of the file for lex? - input() return value
I wonder what is supposed to happen when a lex lexer reaches the end of the input calling input().
This information needs to be included in the manual. The only information I found in the flex manual states:
If 'input()' encounters an end-of-file the normal 'yywrap()' processing is done. A "real" end-of-file is returned by 'input()' as 'EOF'.
That part is the same for flex 2.5.4 and flex 2.6.4. But the two versions behave quite differently. Also what is an ''end-of-file" vs a "'real' end-of-file"?
The breaking change between 2.5.4 and 2.6.4 should be documented in the manual. And I wonder why it was made.
This small example reproduces the difference:
%%
. {for(int i = 0; i < 4; i++) {int ch = input(); printf("%d\n", ch);}}
%%
main()
{
yylex();
}
int
yywrap (void)
{
printf("yywrap!\n");
return 1;
}
Invoked on input containing only one character, I see
philipp@notebook5:/tmp$ ./a.out < test.c
10
yywrap!
-1
yywrap!
-1
yywrap!
-1
yywrap!
i.e. input() returning EOF for flex 2.5.4, and
philipp@notebook5:/tmp$ ./a.out < test.c
10
yywrap!
0
yywrap!
0
yywrap!
0
yywrap!
i.e. input() returning 0 for flex 2.6.4.
There is also a Debian bug report about this: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=911415 and there was a previous issue reported here, but closed without comment: https://github.com/westes/flex/issues/394
This change breaks e.g. the Small Device C Compiler.
The change was made here, but there is no information as to why, and no corresponding change in documentation: https://github.com/westes/flex/commit/f863c9490e6912ffcaeb12965fb3a567a10745ff
This also breaks the scanner used by libdtrace in FreeBSD.