php-confusable-homoglyphs
php-confusable-homoglyphs copied to clipboard
Issue with the json file? Confusables for characters 'm' and 'w'
Something seems off with just those characters in the json file.
Repro:
<?php declare(strict_types=1);
$all = file_get_contents('confusables.json');
$all = json_decode($all, true);
print_r($all['a']); // this is fine, prints out array of 23 confusables
print_r($all['m']); // PROBLEM: Only prints one confusable, {"c":"rn","n":"LATIN SMALL LETTER R, LATIN SMALL LETTER N"}
print_r($all['w']); // PROBLEM: Only prints one confusable, {"c":"vv","n":"LATIN SMALL LETTER V, LATIN SMALL LETTER V"}
What both of these have in common is that the only confusable happens to be a double char: m
has rn
and w
has vv
, so maybe there is a bug in the generation of this file that doesn't know about multi-character confusables?
Here's a link showing actual confusables for M and W, which I would expect to be in this JSON file:
https://util.unicode.org/UnicodeJsps/confusables.jsp?a=manwe&r=None
Update, I realized I wasn't checking for the uppercase versions too (M
and W
) but it still looks like some are missing for some reason.
Interesting, i'll see if I can investigate further.
Interesting, it seems an update to the json files results in w
returning the correct values but m
still returning just one:
Array
(
[0] => Array
(
[c] => rn
[n] => LATIN SMALL LETTER R, LATIN SMALL LETTER N
)
)
I have added a breaking test on an issue branch here: https://github.com/photogabble/php-confusable-homoglyphs/blob/issue/9-checking-missing-confusables/tests/ConfusableTest.php#L123-L131