parser
parser copied to clipboard
Inconsistent internal parser state
This issue describes a bug in Elm.Kernel.Parser.findSubString
.
Note: the following issues describe symptoms of this bug:
- #2
- #20
- #46
In the same way, the following pull request tries to fix the symptoms:
- #21
The Elm Parser internally keeps track of the current position in two ways:
- as a row and a column (like a code editor)
- as an offset into the source string.
Normally both kinds of position infos (row and column vs. offset) are in sync with each other. (For a given source string, you can calculate both row and column from the offset and vice versa.)
The bug in Elm.Kernel.Parser.findSubString
breaks this synchronicity, though.
This affects the following parsers:
-
lineComment
-
multiComment
-
chompUntil
-
chompUntilEndOr
They set...
- row and column after the (closing) token
- the offset before the (closing) token
Here's an example with chompUntil
:
import Parser exposing ((|.), (|=), Parser)
testParser : Parser { row : Int, col : Int, offset : Int }
testParser =
Parser.succeed (\row col offset -> { row = row, col = col, offset = offset })
|. Parser.chompUntil "token"
|= Parser.getRow
|= Parser.getCol
|= Parser.getOffset
Parser.run testParser "< token >"
--> Ok { row = 1, col = 8, offset = 2 }
The state after the test parser is run:
- row = 1, col = 8 (corresponding to offset = 7) --> after the token
- offset = 2 (corresponding to row = 1, col = 3) --> before the token
The root cause for these bugs lies in the Elm.Kernel.Parser.findSubString
function:
https://github.com/elm/parser/blob/02839df10e462d8423c91917271f4b6f8d2f284d/src/Elm/Kernel/Parser.js#L120-L134
If the smallString
is found, the returned newOffset
is at the position before the smallString (the result of the indexOf
function), but the new row
and col
after the smallString (at the target
position).
Note: the following pull request tries to fix the comment of the Elm.Kernel.Parser.findSubString
function
to correctly describe the buggy behavior:
- #37