FsLexYacc
FsLexYacc copied to clipboard
"unrecognized input" after upgrade
Description
I recently upgraded from FsLexYacc 10.0 to the latest 11.3.0. After the upgrade, parsing a comment line // ä now fails with "unrecognized input". I have made no changes to the lexer or parser options, nor to the parser or lexer definitions.
Repro steps
I have managed to create a small-ish reproducer:
Parser.fsy:
%token EOF
%token <string*FSharp.Text.Lexing.Position> IDENTIFIER
%start top
%type <string> top
%%
top: EOF { "hello" }
Lexer.fsl:
{
module Lexer
open FSharp.Text.Lexing
open Parser
let lexeme lexbuf = LexBuffer<char>.LexemeString lexbuf
}
let alpha = ['a' - 'z' 'A' - 'Z']
let swe = ['ä' 'Ä' 'ö' 'Ö' 'å' 'Å' ]
let letter = alpha | swe
let ident = letter+
let newline = ('\n' | "\r\n" )
rule token = parse
| "//" { commentline lexbuf.StartPos lexbuf }
| ident { IDENTIFIER(lexeme lexbuf, lexbuf.StartPos) }
| newline { token lexbuf }
| eof { EOF }
| _ { failwith "unknown token" }
and commentline p = parse
| newline { token lexbuf }
| eof { EOF }
| _ { commentline p lexbuf }
Program.fs:
open Parser
open Lexer
let input = "// ä"
let lexbuf = FSharp.Text.Lexing.LexBuffer<_>.FromString input
let result = Parser.top Lexer.token lexbuf
printfn "%s" result
FsLexYaccRepro.fsproj:
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net8.0</TargetFramework>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="FsLexYacc.Runtime" Version="11.3.0" />
<PackageReference Include="FsLexYacc" Version="11.3.0" />
</ItemGroup>
<ItemGroup>
<FsLex Include="Lexer.fsl">
<OtherFlags>--unicode</OtherFlags>
</FsLex>
<FsYacc Include="Parser.fsy">
<OtherFlags>--module Parser</OtherFlags>
</FsYacc>
<Compile Include="Parser.fs" />
<Compile Include="Lexer.fs" />
<Compile Include="Program.fs" />
</ItemGroup>
</Project>
Expected behavior
When running the program above with dotnet run the output should be "hello".
Actual behavior
We get an exception with the stacktrace:
Unhandled exception. System.Exception: unrecognized input
at FSharp.Text.Lexing.LexBuffer`1.EndOfScan() in /home/runner/work/FsLexYacc/FsLexYacc/src/FsLexYacc.Runtime/Lexing.fs:line 128
at FSharp.Text.Lexing.UnicodeTables.scanUntilSentinel(LexBuffer`1 lexBuffer, Int32 state) in /home/runner/work/FsLexYacc/FsLexYacc/src/FsLexYacc.Runtime/Lexing.fs:line 448
at Lexer.commentline(Position p, LexBuffer`1 lexbuf) in C:\cygwin64\home\daab\dev\FsLexYaccRepro\Lexer.fs:line 81
at Lexer.token(LexBuffer`1 lexbuf) in C:\cygwin64\home\daab\dev\FsLexYaccRepro\Lexer.fs:line 18
at [email protected](LexBuffer`1 lexbuf)
at FSharp.Text.Parsing.Implementation.interpret[tok,a](Tables`1 tables, FSharpFunc`2 lexer, LexBuffer`1 lexbuf, Int32 initialState) in /home/runner/work/FsLexYacc/FsLexYacc/src/FsLexYacc.Runtime/Parsing.fs:line 346
at FSharp.Text.Parsing.Tables`1.Interpret[char](FSharpFunc`2 lexer, LexBuffer`1 lexbuf, Int32 startState) in /home/runner/work/FsLexYacc/FsLexYacc/src/FsLexYacc.Runtime/Parsing.fs:line 498
at Parser.engine[a](FSharpFunc`2 lexer, LexBuffer`1 lexbuf, Int32 startState) in C:\cygwin64\home\daab\dev\FsLexYaccRepro\Parser.fs:line 111
at Parser.top[a](FSharpFunc`2 lexer, LexBuffer`1 lexbuf) in C:\cygwin64\home\daab\dev\FsLexYaccRepro\Parser.fs:line 113
at <StartupCode$FsLexYaccRepro>.$Program.main@() in C:\cygwin64\home\daab\dev\FsLexYaccRepro\Program.fs:line 6
Note that parsing the input "// a" works fine. Also, parsing works if I remove ä from swe in Lexer.fsl.
Bisection indicates that the regression was introduced with 48ec571 (break out core domain logic and generation into core libraries (#144), 2021-01-27).