machine-learning-book icon indicating copy to clipboard operation
machine-learning-book copied to clipboard

Matching uppercase letters in a lowercase string

Open rcongiu opened this issue 1 year ago • 2 comments

In here, to match emoticons: Line 211 https://github.com/rasbt/machine-learning-book/blob/bc27b404956c1555777282624eb5b8c50c818bfd/ch15/ch15_part2.ipynb#L211

shouldn't this betext.upper()instead of text.lower() since we have capital P and D in the match expression ? Or in alternative make the regex ignore the case , re.findall('(?::|;|=)(?:-)?(?:\)|\(|D|P)', text.lower(), flags=re.IGNORECASE). Like it is now, it looks for uppercase letters in a string that's all lower case so it will never match anything.

rcongiu avatar Dec 27 '23 23:12 rcongiu

Thanks for the comment. I think if it is all in upper case it, characters like ":-)" would become "":_)" etc. I think instead of doing text.lower(), which would catch things like ":-P". To preserve the original characters, it could perhaps be just

emoticons = re.findall('(?::|;|=)(?:-)?(?:\)|\(|D|P)', text)

rasbt avatar Dec 28 '23 08:12 rasbt

Thanks for the comment. I think if it is all in upper case it, characters like ":-)" would become "":_)" etc.

I don't think so, .lower() and .upper() only work on actual letters and would not change the "-" to "_".

rcongiu avatar Dec 28 '23 23:12 rcongiu