relevant-post-bot
relevant-post-bot copied to clipboard
Improve accuracy by adding a step when comparing titles
I noticed that this thread which should be an easy 100% match, was a 75% match:
https://i.imgur.com/SYNE3ou.png
The problem seems to be that even though the words are the same, the uppercase/lowercase of the letters throws it off. Perhaps making the titles that are being compared all lowercase/uppercase before comparing them would fix this problem?
Adding to this, normalizing the apostrophe characters could also be good; I've encountered a case where the original title had ’ apostrophes, and the parody had '.
Perhaps also introduce a mechanism to rank similarly based on individual character differences like
- case variance (uppercase vs lowercase variants of the same word)
- apostrophe existence (dont vs don't)