memn2n
memn2n copied to clipboard
tokenize function code in data_utils.py is incorrect
with the test intention that
>>> tokenize('Bob dropped the apple. Where is the apple?')
['Bob', 'dropped', 'the', 'apple', '.', 'Where', 'is', 'the', 'apple', '?']
we should write like this:
def tokenize(sent):
return [x for x in re.findall(r"\w+(?:'\w+)?|[^\w\s]", sent)]