python-course icon indicating copy to clipboard operation
python-course copied to clipboard

remove_punc concatenates words

Open xjlc opened this issue 10 years ago • 1 comments

remove_punc and remove_punc2 concatenate some words. For example, "Woodhouse.--Dear" gets replaced by WoodhouseDear. This leads to arguably questionable results of the later tests. For example, the count of Woodhouse by an implementation of remove_punc that replaces punctuation by " " and later replaces " " by " " is 314. Similarly, the frequency count of "the" is 5204 rather than 5146. You are probably aware of this, but a cautionary note in the documentation would be warranted in my opinion.

xjlc avatar Feb 04 '15 11:02 xjlc

Hi! Thanks for your comments. I'll have a look at this.

fbkarsdorp avatar Feb 04 '15 11:02 fbkarsdorp