baron
baron copied to clipboard
Accented letters in identifiers seems to break baron
I am parsing some of my student's code and I get the error:
Untreated elements: 'ù_usata'
because the identifier contains accented letters. (in this case the identifier was from the following definition
def parola_più_usata(dizionario,lista_ordinata):
Could the tokenizer possibly be fixed to allow also accented letters?
I have temporarily fixed it (badly, I am sure a better solution exists) by changing file baron/splitter.py at line 56 from
for section in (string.ascii_letters + "_" + "1234567890", " \t"):
to
for section in (string.ascii_letters + "àèìòùé_" + "1234567890", " \t"):