SafeText
SafeText copied to clipboard
Use confusables.txt as a homoglyph source
I see that you have a small list of homoglyphs in characters_safetext.py. Unicode has a reference text file for such information that seems pretty comprehensive: confusables.txt (Techincal Report)
Would it make sense to incorporate this dataset into your tool?
Hey! Thanks so much for this! I'll definitely be taking a look into this in the next few days. This is a great find. I may not use the whole file, just those that are similar to Latin characters, so I have to find a way to process this all.
It might also make sense, rather than using handwritten names for these confusable characters, to perform a live lookup for the Unicode glyph name — it would save you coming up with names for a start :)