skope-rules icon indicating copy to clipboard operation
skope-rules copied to clipboard

SyntaxError: Python keyword not valid identifier in numexpr query

Open saurabhdaalia opened this issue 6 years ago • 10 comments

When I add feature names to the SkopeRules model, I encounter this error.

Some of the feature names are :

data__blocked_bugs_number
data__ever_affected=False
data__ever_affected=True
data__has_crash_signature=False
data__has_crash_signature=True
data__has_github_url=False
data__has_github_url=True
data__has_str=irrelevant
data__has_str=no
Traceback (most recent call last):
  File "run.py", line 55, in <module>
    model.train()
  File "C:\Users\Saurabh Daalia\Desktop\bugbug\bugbug\model.py", line 101, in train
    self.skope_clf.fit(X_train, y_train)
  File "C:\Users\Saurabh Daalia\Anaconda3\lib\site-packages\skrules\skope_rules.py", line 350, in fit
    for r in set(rules_from_tree)]
  File "C:\Users\Saurabh Daalia\Anaconda3\lib\site-packages\skrules\skope_rules.py", line 350, in <listcomp>
    for r in set(rules_from_tree)]
  File "C:\Users\Saurabh Daalia\Anaconda3\lib\site-packages\skrules\skope_rules.py", line 600, in _eval_rule_perf
    detected_index = list(X.query(rule).index)
  File "C:\Users\Saurabh Daalia\Anaconda3\lib\site-packages\pandas\core\frame.py", line 3088, in query
    res = self.eval(expr, **kwargs)
  File "C:\Users\Saurabh Daalia\Anaconda3\lib\site-packages\pandas\core\frame.py", line 3203, in eval
    return _eval(expr, inplace=inplace, **kwargs)
  File "C:\Users\Saurabh Daalia\Anaconda3\lib\site-packages\pandas\core\computation\eval.py", line 294, in eval
    truediv=truediv)
  File "C:\Users\Saurabh Daalia\Anaconda3\lib\site-packages\pandas\core\computation\expr.py", line 749, in __init__
    self.terms = self.parse()
  File "C:\Users\Saurabh Daalia\Anaconda3\lib\site-packages\pandas\core\computation\expr.py", line 766, in parse
    return self._visitor.visit(self.expr)
  File "C:\Users\Saurabh Daalia\Anaconda3\lib\site-packages\pandas\core\computation\expr.py", line 327, in visit
    raise e
  File "C:\Users\Saurabh Daalia\Anaconda3\lib\site-packages\pandas\core\computation\expr.py", line 321, in visit
    node = ast.fix_missing_locations(ast.parse(clean))
  File "C:\Users\Saurabh Daalia\Anaconda3\lib\ast.py", line 35, in parse
    return compile(source, filename, mode, PyCF_ONLY_AST)
  File "<unknown>", line 1
SyntaxError: Python keyword not valid identifier in numexpr query

saurabhdaalia avatar Feb 11 '19 20:02 saurabhdaalia

is it because you put = in your feature names?

ngoix avatar Mar 02 '19 15:03 ngoix

I see, I think that might be the issue. But what is causing this issue? Is there any workaround for it?

saurabhdaalia avatar Mar 03 '19 06:03 saurabhdaalia

the variable names are parsed to build the rules, which causes your bug. I don't see an easy workaround. You really shouldn't put = in your feature names...

ngoix avatar Mar 10 '19 21:03 ngoix

You really shouldn't put = in your feature names...

Feature names are strings, so it seems like a limitation to restrict what they can contain (everything else in the scikit-learn world doesn't care about it). Maybe it should be allowed, or at least documented somewhere?

marco-c avatar Mar 13 '19 00:03 marco-c

you are right this should be documented. Feel free to open a PR for that or for fixing the syntax error :)

ngoix avatar Mar 13 '19 20:03 ngoix

Guys, I too get the similar error, when I run the below command, if I remove the pipe, it works with only one condition

SyntaxError: Python keyword not valid identifier in numexpr query

Error is --- train_outliers = train.query('age_z > 3 | age_z < ‐3')

ghost avatar Aug 25 '19 23:08 ghost

Guys, I too get the similar error, when I run the below command, if I remove the pipe, it works with only one condition

SyntaxError: Python keyword not valid identifier in numexpr query

Error is --- train_outliers = train.query('age_z > 3 | age_z < ‐3')

This happened to me as well. The problem was that I kept holding down the alt-key when writing the following the pipe symbol. I encounter this frequently, as writing pipe requires me to hold alt.

vedal avatar Mar 30 '20 08:03 vedal

Guys, I too get the similar error, when I run the below command, if I remove the pipe, it works with only one condition SyntaxError: Python keyword not valid identifier in numexpr query Error is --- train_outliers = train.query('age_z > 3 | age_z < ‐3')

This happened to me as well. The problem was that I kept holding down the alt-key when writing the following the pipe symbol. I encounter this frequently, as writing pipe requires me to hold alt.

Happened to me too, do anyone know how to fix?! Thanks xD

osdiego avatar Jun 02 '20 14:06 osdiego

@osdiego Did you copy and paste from another document. The "-3" is not being read correctly by the query function. Try removing/deleting the minus and replacing it. Let me know if this works.

CCNOAI avatar Jun 03 '20 10:06 CCNOAI

@CCNOAI I'm doing something like: (importance >= 0 | importance = -7). The question is: I need to search like that, is there no way?

osdiego avatar Jun 03 '20 14:06 osdiego