DialoGPT
DialoGPT copied to clipboard
Repetitive processing in reddit extractor
https://github.com/microsoft/DialoGPT/blob/b85558dea5391f83b20120d6c93b9f79fcc72311/reddit_extractor/src/reddit.py#L108-L112
Thanks for pointing this out. We will check this issue.
well... I have to admit this process doesn't look very elegant -- basically what I wanted to is to remain special token __url__
and __mention__
, but line txt = re.sub(r"[^A-Za-z0-9()\[\]:,.!?'“” ]", " ", txt)
will remove _
, so I replace __url__
to URL
first and then replace it back.