confusables icon indicating copy to clipboard operation
confusables copied to clipboard

Confusables Coding Questions

Open Zamiell opened this issue 4 years ago • 0 comments

Hello woodgern and thanks for the amazing library. I'm working on porting this code to Golang so that others can benefit from your work, and I had a few questions about the code.

  1. In parse.py on line 50, you check to see if str1 only contains 1 character. However, this appears unnecessary, because all of the characters in the left-most column of confusing.txt contain only 1 character (and 1 code point). Is there some other reason that you have included this?

  2. In parse.py on line 59, you check to see if str2 only contains 1 character. Why exactly is this? For example, consider the following line of confusables.txt:

0191 ;	0046 0326 ;	MA	# ( Ƒ → F̦ ) LATIN CAPITAL LETTER F WITH HOOK → LATIN CAPITAL LETTER F, COMBINING COMMA BELOW	# →F̡→

This seems like it should be included in a list of "look-alike" characters (since it case-inverts into "f̦"), but it is now skipped because it has a length of two. Is this intentional?

  1. On lines 40-48, you add both sides to each other's map entry. However, you don't do the same thing on lines 50-66. In other words, shouldn't
unicode_confusable_map[str1].add(case_change)

be instead:

unicode_confusable_map[str1].add(case_change)
unicode_confusable_map[str2].add(case_change)

?

Zamiell avatar May 11 '20 23:05 Zamiell