hedy icon indicating copy to clipboard operation
hedy copied to clipboard

[FIX] Properly use non-latin numbers in input and output

Open Felienne opened this issue 3 years ago • 7 comments

We now support non-Latin variable names but... some languages have different numbers characters too, i.e. in Hindi (code for level 21):

print('५+३ क्या है ?')
        उत्तर = ५+३
        print('अब उत्तर है:')
        print(उत्तर)
        if उत्तर == ८:
            print('यह सही है!')
        else:
            print('अरे नहीं! यह गलत है!')

This does not work because 1) we have no support in the grammar and 2) the transpiler also needs to know the values to translate this to Python (assuming Python does not support this? Maybe it does?)

This is a hairy issue but interesting!

May 2022

Some updates on the progress here, #1929 and #2722 has made some more progress towards this, but we still need:

  • ~~non-latin decimals (level 12 and up)~~ Fixed by #2741!!
  • non-latin output

Felienne avatar Oct 19 '21 09:10 Felienne

assuming Python does not support this? Maybe it does?

It doesn't support this.

>>> ५+३
  File "<stdin>", line 1
    ५+३
    ^
SyntaxError: invalid character '५' (U+096B)

bjorn3 avatar Oct 19 '21 09:10 bjorn3

assuming Python does not support this? Maybe it does?

It doesn't support this.

>>> ५+३
  File "<stdin>", line 1
    ५+३
    ^
SyntaxError: invalid character '५' (U+096B)

Pity but thanks for checking!!

Felienne avatar Oct 19 '21 10:10 Felienne

Just checked and this DOES work:

>>> int("५") + int("३")
8

Which means it's a matter of letting the parser accept non-ASCII digits!

For reference, Python's int() parses non-ASCII digits by transliterating them to ASCII digits, one by one. Ultimately, this delegates to unicodedata.decimal().

>>> import unicodedata
>>> unicodedata.decimal("५")
5

eddieantonio avatar Feb 08 '22 16:02 eddieantonio

Just checked and this DOES work:

>>> int("५") + int("३")
8

Which means it's a matter of letting the parser accept non-ASCII digits!

For reference, Python's int() parses non-ASCII digits by transliterating them to ASCII digits, one by one. Ultimately, this delegates to unicodedata.decimal().

>>> import unicodedata
>>> unicodedata.decimal("५")
5

Thanks a lot for that Eddie, but sadly, Python implements this but Skulpt (which we use to run Python in JavaScript) does not 😭 image

So I guess I will have to shoestring this together myself

Felienne avatar Feb 11 '22 12:02 Felienne

So I guess I will have to shoestring this together myself

So I learned the hard way that parsing UnicodeData.txt is not super fun. Here's a table of numerals and their digit value (including Eastern Arabic, Hindi/Devanagari, and Bengali) from the Unicode Character Database: https://github.com/eddieantonio/numerals-in-unicode/blob/main/Numerals.ipynb

eddieantonio avatar Feb 11 '22 12:02 eddieantonio

So I guess I will have to shoestring this together myself

So I learned the hard way that parsing UnicodeData.txt is not super fun. Here's a table of numerals and their digit value (including Eastern Arabic, Hindi/Devanagari, and Bengali) from the Unicode Character Database: https://github.com/eddieantonio/numerals-in-unicode/blob/main/Numerals.ipynb

Would love to hear about your battle scars one day!

Felienne avatar Feb 11 '22 12:02 Felienne