confusables
confusables copied to clipboard
Confusables Coding Questions
Hello woodgern and thanks for the amazing library. I'm working on porting this code to Golang so that others can benefit from your work, and I had a few questions about the code.
-
In parse.py on line 50, you check to see if str1 only contains 1 character. However, this appears unnecessary, because all of the characters in the left-most column of confusing.txt contain only 1 character (and 1 code point). Is there some other reason that you have included this?
-
In parse.py on line 59, you check to see if str2 only contains 1 character. Why exactly is this? For example, consider the following line of confusables.txt:
0191 ; 0046 0326 ; MA # ( Ƒ → F̦ ) LATIN CAPITAL LETTER F WITH HOOK → LATIN CAPITAL LETTER F, COMBINING COMMA BELOW # →F̡→
This seems like it should be included in a list of "look-alike" characters (since it case-inverts into "f̦"), but it is now skipped because it has a length of two. Is this intentional?
- On lines 40-48, you add both sides to each other's map entry. However, you don't do the same thing on lines 50-66. In other words, shouldn't
unicode_confusable_map[str1].add(case_change)
be instead:
unicode_confusable_map[str1].add(case_change)
unicode_confusable_map[str2].add(case_change)
?