DialoGPT icon indicating copy to clipboard operation
DialoGPT copied to clipboard

Repetitive processing in reddit extractor

Open qywu opened this issue 4 years ago • 2 comments

https://github.com/microsoft/DialoGPT/blob/b85558dea5391f83b20120d6c93b9f79fcc72311/reddit_extractor/src/reddit.py#L108-L112

qywu avatar Nov 08 '19 10:11 qywu

Thanks for pointing this out. We will check this issue.

dreasysnail avatar Nov 10 '19 10:11 dreasysnail

well... I have to admit this process doesn't look very elegant -- basically what I wanted to is to remain special token __url__ and __mention__, but line txt = re.sub(r"[^A-Za-z0-9()\[\]:,.!?'“” ]", " ", txt) will remove _, so I replace __url__ to URL first and then replace it back.

golsun avatar Nov 11 '19 03:11 golsun