PyTorchNLPBook
PyTorchNLPBook copied to clipboard
Function preprocess_text does not seem to strip punctuations
def preprocess_text(text):
text = ' '.join(word.lower() for word in text.split(" "))
text = re.sub(r"([.,!?])", r" \1 ", text)
text = re.sub(r"[^a-zA-Z.,!?]+", r" ", text)
return text
Calling preprocess_text('Are you a, boy or a girl?') returns:
''are you a , boy or a girl ? "