php-confusable-homoglyphs icon indicating copy to clipboard operation
php-confusable-homoglyphs copied to clipboard

Issue with the json file? Confusables for characters 'm' and 'w'

Open michaelbutler opened this issue 3 years ago • 3 comments

Something seems off with just those characters in the json file.

Repro:

<?php declare(strict_types=1);

$all = file_get_contents('confusables.json');
$all = json_decode($all, true);

print_r($all['a']); // this is fine, prints out array of 23 confusables

print_r($all['m']); // PROBLEM: Only prints one confusable, {"c":"rn","n":"LATIN SMALL LETTER R, LATIN SMALL LETTER N"}

print_r($all['w']); // PROBLEM: Only prints one confusable, {"c":"vv","n":"LATIN SMALL LETTER V, LATIN SMALL LETTER V"}

What both of these have in common is that the only confusable happens to be a double char: m has rn and w has vv, so maybe there is a bug in the generation of this file that doesn't know about multi-character confusables?

Here's a link showing actual confusables for M and W, which I would expect to be in this JSON file:

https://util.unicode.org/UnicodeJsps/confusables.jsp?a=manwe&r=None

michaelbutler avatar Dec 15 '21 17:12 michaelbutler

Update, I realized I wasn't checking for the uppercase versions too (M and W) but it still looks like some are missing for some reason.

michaelbutler avatar Dec 15 '21 17:12 michaelbutler

Interesting, i'll see if I can investigate further.

carbontwelve avatar Feb 17 '22 21:02 carbontwelve

Interesting, it seems an update to the json files results in w returning the correct values but m still returning just one:

Array
(
    [0] => Array
        (
            [c] => rn
            [n] => LATIN SMALL LETTER R, LATIN SMALL LETTER N
        )

)

I have added a breaking test on an issue branch here: https://github.com/photogabble/php-confusable-homoglyphs/blob/issue/9-checking-missing-confusables/tests/ConfusableTest.php#L123-L131

carbontwelve avatar Feb 17 '22 22:02 carbontwelve