confusable_homoglyphs
confusable_homoglyphs copied to clipboard
ϲοnfuѕаblе_һοmоɡlyphs
confusable_homoglyphs [doc] <http://confusable-homoglyphs.readthedocs.io/en/latest/>__
.. image:: https://img.shields.io/travis/vhf/confusable_homoglyphs.svg :target: https://travis-ci.org/vhf/confusable_homoglyphs
.. image:: https://img.shields.io/pypi/v/confusable_homoglyphs.svg :target: https://pypi.python.org/pypi/confusable_homoglyphs
.. image:: https://readthedocs.org/projects/confusable_homoglyphs/badge/?version=latest :target: http://confusable-homoglyphs.readthedocs.io/en/latest/ :alt: Documentation Status
a homoglyph is one of two or more graphemes, characters, or glyphs with
shapes that appear identical or very similar
wikipedia:Homoglyph <https://en.wikipedia.org/wiki/Homoglyph>__
Unicode homoglyphs can be a nuisance on the web. Your most popular client, AlaskaJazz, might be upset to be impersonated by a trickster who deliberately chose the username ΑlaskaJazz.
AlaskaJazzis single script: only Latin characters.ΑlaskaJazzis mixed-script: the first character is a greek letter.
You might also want to avoid people being tricked into entering their
password on www.microsоft.com or www.faϲebook.com instead of
www.microsoft.com or www.facebook.com. Here is a utility <http://unicode.org/cldr/utility/confusables.jsp>__ to play
with these confusable homoglyphs.
Not all mixed-script strings have to be ruled out though, you could only exclude mixed-script strings containing characters that might be confused with a character from some unicode blocks of your choosing.
Alloandρττare fine: single script.AlloΓis fine when our preferred script alias is 'latin': mixed script, butΓis not confusable.Alloρis dangerous: mixed script andρcould be confused withp.
This library is compatible Python 2 and Python 3.
API documentation <http://confusable-homoglyphs.readthedocs.io/en/latest/apidocumentation.html>__
Is the data up to date?
Yep.
The unicode blocks aliases and names for each character are extracted
from this file <http://www.unicode.org/Public/UNIDATA/Scripts.txt>__
provided by the unicode consortium.
The matrix of which character can be confused with which other
characters is built using this file <http://www.unicode.org/Public/security/latest/confusables.txt>__
provided by the unicode consortium.
This data is stored in two JSON files: categories.json and
confusables.json. If you delete them, they will both be recreated by
downloading and parsing the two abovementioned files and stored as JSON
files again.