SafeText icon indicating copy to clipboard operation
SafeText copied to clipboard

Use confusables.txt as a homoglyph source

Open noelleleigh opened this issue 7 years ago • 2 comments

I see that you have a small list of homoglyphs in characters_safetext.py. Unicode has a reference text file for such information that seems pretty comprehensive: confusables.txt (Techincal Report)

Would it make sense to incorporate this dataset into your tool?

noelleleigh avatar Jan 02 '18 13:01 noelleleigh

Hey! Thanks so much for this! I'll definitely be taking a look into this in the next few days. This is a great find. I may not use the whole file, just those that are similar to Latin characters, so I have to find a way to process this all.

DavidJacobson avatar Jan 03 '18 05:01 DavidJacobson

It might also make sense, rather than using handwritten names for these confusable characters, to perform a live lookup for the Unicode glyph name — it would save you coming up with names for a start :)

owenblacker avatar Jan 15 '18 11:01 owenblacker