hedy icon indicating copy to clipboard operation
hedy copied to clipboard

[Language idea] Ignore accents when transpiling

Open jpelay opened this issue 2 years ago • 6 comments

When checking the input of an if, we match exactly the string given by the user, which is nice! But sometimes maybe we want to be more forgiving about what constitutes a match. One of these cases are accents: a character with an accent looks very similar to their ascii counterpart, requires you to press a different key before, and are (in the Spanish case) are not that used in day to day conversations, so it's easy to ignore them.

This could generate problems when checking an if, for example.

if ans is sí
   print 'Awesome'

If the kids inputs si, but not sí, the if will not enter and possibly confuse the kid. This same logic can be applied to keywords and variables, we'd want jesus and jesús to be the same variable.

@Felienne has pointed out that we might not want to do this for every language or for every type of accent, because they're not necessarily equivalent in some languages, like French for example.

One posible way to deal with this, suggested by @Felienne and @TiBiBa is to create a mapper that maps chars with accents to their ascci equivalent, one downside is that this is very slow.

jpelay avatar Apr 06 '22 12:04 jpelay

Apparently there is already a library for this! (Of course there is in Python...):

import unidecode

somestring = "àéêöhello"

#convert plain text to utf-8
u = unicode(somestring, "utf-8")
#convert utf-8 to normal text
print unidecode.unidecode(u)

Output:

aeeohello

Found the example here: https://stackoverflow.com/questions/44431730/how-to-replace-accented-characters#44433664

TiBiBa avatar Apr 06 '22 13:04 TiBiBa

Ow wow that is a great find @TiBiBa!

Felienne avatar Apr 06 '22 13:04 Felienne

We do however, still have the issue of comparisons on the front-end so we should implement a similar solution within TypeScript. Because we don't talk with the server after the code is transpiled to Python (correct me if I'm wrong!), the following code needs both the front-end and back-end to replace the characters:

animal = 'panda'
if animal is pandá print 'awesome!'
else print 'sad face'

TiBiBa avatar Apr 06 '22 13:04 TiBiBa

Apparently there is already a library for this! (Of course there is in Python...):

import unidecode

somestring = "àéêöhello"

#convert plain text to utf-8
u = unicode(somestring, "utf-8")
#convert utf-8 to normal text
print unidecode.unidecode(u)

Output:

aeeohello

Found the example here: https://stackoverflow.com/questions/44431730/how-to-replace-accented-characters#44433664

Yes! I found this earlier and they mention some problems, but I haven't tested myself (https://stackoverflow.com/questions/517923/what-is-the-best-way-to-remove-accents-normalize-in-a-python-unicode-string)

jpelay avatar Apr 06 '22 13:04 jpelay

We do however, still have the issue of comparisons on the front-end so we should implement a similar solution within TypeScript. Because we don't talk with the server after the code is transpiled to Python (correct me if I'm wrong!), the following code needs both the front-end and back-end to replace the characters:

animal = 'panda'
if animal is pandá print 'awesome!'
else print 'sad face'

Maybe we can do the same thing as with the numeric characters and include a function within the transpiled code something like:

input = normalize_accents(input)
to_check = normalize_accents(rhs_if)

if input == to_check:

jpelay avatar Apr 06 '22 13:04 jpelay

And this one @boryanagoncharenko? Could be some fun language puzzling?

Felienne avatar Feb 23 '24 08:02 Felienne