natural icon indicating copy to clipboard operation
natural copied to clipboard

Bug report in SequenceTokenizerNew

Open Jabher opened this issue 3 years ago • 0 comments

SequenceTokenizerNew fails on following call:

sentenceTokenizer.tokenize('"All ticketed passengers should now be in the Blue Concourse sleep lounge. Make sure your validation papers are in order. Thank you". The upstairs lounge was not at all grungy.') (quote from "The Jaunt" by Stephen King)

with following message:

{
    "message": "Expected [ \\t\\n\\r.?!] or [)\\]}\"'`’] but \"M\" found.",
    "expected": [
        {
            "type": "class",
            "parts": [
                " ",
                "\t",
                "\n",
                "\r",
                ".",
                "?",
                "!"
            ],
            "inverted": false,
            "ignoreCase": false
        },
        {
            "type": "class",
            "parts": [
                ")",
                "]",
                "}",
                "\"",
                "'",
                "`",
                "’"
            ],
            "inverted": false,
            "ignoreCase": false
        }
    ],
    "found": "M",
    "location": {
        "start": {
            "offset": 75,
            "line": 1,
            "column": 76
        },
        "end": {
            "offset": 76,
            "line": 1,
            "column": 77
        }
    },
    "name": "SyntaxError"
}

Jabher avatar Nov 27 '22 23:11 Jabher