memn2n tokenize function code in data

tokenize function code in data_utils.py is incorrect

Open zpengc opened this issue 3 years ago • 0 comments

with the test intention that

>>> tokenize('Bob dropped the apple. Where is the apple?')
    ['Bob', 'dropped', 'the', 'apple', '.', 'Where', 'is', 'the', 'apple', '?']

we should write like this:

def tokenize(sent):
    return [x for x in re.findall(r"\w+(?:'\w+)?|[^\w\s]", sent)]

Dec 09 '21 00:12 zpengc