PyTorchNLPBook icon indicating copy to clipboard operation
PyTorchNLPBook copied to clipboard

Function preprocess_text does not seem to strip punctuations

Open govindgnair23 opened this issue 6 years ago • 0 comments

def preprocess_text(text):
    text = ' '.join(word.lower() for word in text.split(" "))
    text = re.sub(r"([.,!?])", r" \1 ", text)
    text = re.sub(r"[^a-zA-Z.,!?]+", r" ", text)
    return text

Calling preprocess_text('Are you a, boy or a girl?') returns:

''are you a , boy or a girl ? "

govindgnair23 avatar Apr 15 '19 13:04 govindgnair23