syntax icon indicating copy to clipboard operation
syntax copied to clipboard

Python generator doesn't use raw strings

Open gankoji opened this issue 3 years ago • 2 comments

Per Python's regular expressions documentation here, if we want Python re's to have the same escaping rules as other langs we need to prepend the strings with 'r' for raw. Otherwise, python will escape characters for us during interpretation.

When I generate a parser with the following lex section:

%lex
 
%%
 
\s+             return
 
\"[^\"]*\"      return 'STRING'
 
\d+\.\d+        return 'FLOAT'
 
\d+             return 'INT'
 
[\w\-+*=<>/]+   return 'SYMBOL'
/lex

I get the following generated python:

_lex_rules = [['^\(', _lex_rule1],
['^\)', _lex_rule2],
['^\s+', _lex_rule3],
['^"[^\"]*"', _lex_rule4],
['^\d+\.\d+', _lex_rule5],
['^\d+', _lex_rule6],                      
['^[\w\-+*=<>/]+', _lex_rule7]] 

These will fail for 'invalid search sequence' or similar, whereas properly prepending the regexes with 'r' manually solves the problem:

_lex_rules = [[r'^\(', _lex_rule1],
[r'^\)', _lex_rule2],
[r'^\s+', _lex_rule3],
[r'^"[^\"]*"', _lex_rule4],
[r'^\d+\.\d+', _lex_rule5],
[r'^\d+', _lex_rule6],                      
[r'^[\w\-+*=<>/]+', _lex_rule7]] 

The python generator (likely) needs a simple update to prepend 'r' to regex strings.

gankoji avatar Nov 22 '22 18:11 gankoji

@gankoji thanks for the report. Yes, we need to fix this - will appreciate a PR for it.

DmitrySoshnikov avatar Nov 22 '22 19:11 DmitrySoshnikov

@DmitrySoshnikov apologies if there are multiple pings, I don't know if you get notified for the PR. PR #135 is up for this issue.

gankoji avatar Nov 23 '22 00:11 gankoji