confusables
confusables copied to clipboard
A python package providing functionality for matching words using different characters but appearing to be a similar/the same word.
When using the normalize function for 'rn', 'r' followed by 'n', it doesn't include 'm' in the returned list despite the fact that applying confusable_characters to 'm' includes 'rn' in...
`confusables.normalize()` has issues with long strings, especially numbers. It creates large lists that consume a great deal of memory (froze my pc lol). These changes allow it to be more...
Not any substantial changes, mostly just ordering, but also getting rid of the 0xFEFF byte order mark.
Adds script to automatically check if confusable pairs are already mapped or not, and generate those entries for custom_confusables.txt. It is run independently of the rest of the module, but...
There are a few characters that get display as an empty string like those for example: ``` \u200b \u200c \u200d \u200e \u200f ``` They can be mixed into any string...
When trying to create a confusable regex from a string like "t[e]st", it will fail to match the string "t[e]st". I believe this is because the characters are not escaped...
In parse.py when creating the confusable sets, there is an assumption that if a certain character is confusable with another, it will also be confuseable with the upper/lower case version....
Hello woodgern and thanks for the amazing library. I'm working on porting this code to Golang so that others can benefit from your work, and I had a few questions...