memn2n icon indicating copy to clipboard operation
memn2n copied to clipboard

tokenize function code in data_utils.py is incorrect

Open zpengc opened this issue 3 years ago • 0 comments

with the test intention that

>>> tokenize('Bob dropped the apple. Where is the apple?')
    ['Bob', 'dropped', 'the', 'apple', '.', 'Where', 'is', 'the', 'apple', '?']

we should write like this:

def tokenize(sent):
    return [x for x in re.findall(r"\w+(?:'\w+)?|[^\w\s]", sent)]

zpengc avatar Dec 09 '21 00:12 zpengc