hedy
hedy copied to clipboard
[FIX] Properly use non-latin numbers in input and output
We now support non-Latin variable names but... some languages have different numbers characters too, i.e. in Hindi (code for level 21):
print('५+३ क्या है ?')
उत्तर = ५+३
print('अब उत्तर है:')
print(उत्तर)
if उत्तर == ८:
print('यह सही है!')
else:
print('अरे नहीं! यह गलत है!')
This does not work because 1) we have no support in the grammar and 2) the transpiler also needs to know the values to translate this to Python (assuming Python does not support this? Maybe it does?)
This is a hairy issue but interesting!
May 2022
Some updates on the progress here, #1929 and #2722 has made some more progress towards this, but we still need:
- ~~non-latin decimals (level 12 and up)~~ Fixed by #2741!!
- non-latin output
assuming Python does not support this? Maybe it does?
It doesn't support this.
>>> ५+३
File "<stdin>", line 1
५+३
^
SyntaxError: invalid character '५' (U+096B)
assuming Python does not support this? Maybe it does?
It doesn't support this.
>>> ५+३ File "<stdin>", line 1 ५+३ ^ SyntaxError: invalid character '५' (U+096B)
Pity but thanks for checking!!
Just checked and this DOES work:
>>> int("५") + int("३")
8
Which means it's a matter of letting the parser accept non-ASCII digits!
For reference, Python's int()
parses non-ASCII digits by transliterating them to ASCII digits, one by one. Ultimately, this delegates to unicodedata.decimal()
.
>>> import unicodedata
>>> unicodedata.decimal("५")
5
Just checked and this DOES work:
>>> int("५") + int("३") 8
Which means it's a matter of letting the parser accept non-ASCII digits!
For reference, Python's
int()
parses non-ASCII digits by transliterating them to ASCII digits, one by one. Ultimately, this delegates tounicodedata.decimal()
.>>> import unicodedata >>> unicodedata.decimal("५") 5
Thanks a lot for that Eddie, but sadly, Python implements this but Skulpt (which we use to run Python in JavaScript) does not 😭
So I guess I will have to shoestring this together myself
So I guess I will have to shoestring this together myself
So I learned the hard way that parsing UnicodeData.txt
is not super fun. Here's a table of numerals and their digit value (including Eastern Arabic, Hindi/Devanagari, and Bengali) from the Unicode Character Database: https://github.com/eddieantonio/numerals-in-unicode/blob/main/Numerals.ipynb
So I guess I will have to shoestring this together myself
So I learned the hard way that parsing
UnicodeData.txt
is not super fun. Here's a table of numerals and their digit value (including Eastern Arabic, Hindi/Devanagari, and Bengali) from the Unicode Character Database: https://github.com/eddieantonio/numerals-in-unicode/blob/main/Numerals.ipynb
Would love to hear about your battle scars one day!