baron Accented letters in identifiers seems to break baron

Accented letters in identifiers seems to break baron

Open asterbini opened this issue 4 years ago • 1 comments

I am parsing some of my student's code and I get the error:

Untreated elements: 'ù_usata'

because the identifier contains accented letters. (in this case the identifier was from the following definition

def parola_più_usata(dizionario,lista_ordinata):

Could the tokenizer possibly be fixed to allow also accented letters?

Jul 26 '20 17:07 asterbini

I have temporarily fixed it (badly, I am sure a better solution exists) by changing file baron/splitter.py at line 56 from

for section in (string.ascii_letters + "_" + "1234567890", " \t"):

for section in (string.ascii_letters + "àèìòùé_" + "1234567890", " \t"):

Jul 26 '20 18:07 asterbini