pandas icon indicating copy to clipboard operation
pandas copied to clipboard

BUG: DataFrame eval will not work with 'Timestamp' column name

Open RobertMouncer opened this issue 3 years ago • 6 comments
trafficstars

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,100, size=(3000, 2)), columns=['Timestamp', 'col1'])
df.eval('Timestamp>6')

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\<>\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\frame.py", line 4191, in eval
    return _eval(expr, inplace=inplace, **kwargs)
  File "C:\Users\<>\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\computation\eval.py", line 353, in eval
    ret = eng_inst.evaluate()
  File "C:\Users\<>\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\computation\engines.py", line 80, in evaluate
    res = self._evaluate()
  File "C:\Users\<>\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\computation\engines.py", line 121, in _evaluate
    return ne.evaluate(s, local_dict=scope)
  File "C:\Users\<>\AppData\Local\Programs\Python\Python38\lib\site-packages\numexpr\necompiler.py", line 821, in evaluate
    signature = [(name, getType(arg)) for (name, arg) in
  File "C:\Users\<>\AppData\Local\Programs\Python\Python38\lib\site-packages\numexpr\necompiler.py", line 821, in <listcomp>
    signature = [(name, getType(arg)) for (name, arg) in
  File "C:\Users\<>\AppData\Local\Programs\Python\Python38\lib\site-packages\numexpr\necompiler.py", line 703, in getType
    raise ValueError("unknown type %s" % a.dtype.name)
ValueError: unknown type object

Issue Description

If 'Timestamp' exists as a column name, and a dataframe eval is performed which includes that column, a ValueError occurs.

Expected Behavior

The expeected behaviour should be that of the same output when applied on the other column that is provided in the example:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,100, size=(3000, 2)), columns=['Timestamp', 'col1'])
df.eval('col1>6')

output:

0        True
1        True
2        True
3        True
4        True
        ...
2995    False
2996     True
2997     True
2998     True
2999     True
Length: 3000, dtype: bool

Installed Versions

INSTALLED VERSIONS ------------------ commit : 945c9ed766a61c7d2c0a7cbb251b6edebf9cb7d5 python : 3.8.1.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19041 machine : AMD64 processor : Intel64 Family 6 Model 142 Stepping 9, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United Kingdom.1252

pandas : 1.3.4 numpy : 1.21.2 pytz : 2020.1 dateutil : 2.8.1 pip : 21.3.1 setuptools : 41.2.0 Cython : None pytest : 5.4.2 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.5.1 html5lib : None pymysql : None psycopg2 : None jinja2 : 2.11.2 IPython : 7.15.0 pandas_datareader: None bs4 : None bottleneck : None fsspec : 0.7.4 fastparquet : None gcsfs : None matplotlib : 3.2.1 numexpr : 2.7.1 odfpy : None openpyxl : 3.0.3 pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : 1.4.1 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : 1.2.0 xlwt : 1.3.0 numba : 0.49.1

RobertMouncer avatar Nov 24 '21 12:11 RobertMouncer

The issue looks to be the fact that you've used "Timestamp" as a column name, and that word is used to refer to the Timestamp class. Simply renaming your column to "timestamp" would fix your issue.

venaturum avatar Nov 24 '21 13:11 venaturum

The issue looks to be the fact that you've used "Timestamp" as a column name, and that word is used to refer to the Timestamp class. Simply renaming your column to "timestamp" would fix your issue.

Sounds like a workaround rather than a solution, perhaps an error should be raised stating the issue.

RobertMouncer avatar Nov 24 '21 15:11 RobertMouncer

There's some handling for name clashes here: https://github.com/pandas-dev/pandas/blob/42adb9f80fb6a048995087276eda2a16ea1cfb66/pandas/core/computation/engines.py#L27-L43

Contributions welcome to figure out if this occurs for other column names and where pandas might detect this and raise a more helpful error.

mzeitlin11 avatar Nov 24 '21 18:11 mzeitlin11

Related: #35595

mzeitlin11 avatar Nov 24 '21 18:11 mzeitlin11

Specifying the df name before the column name should do the trick df.eval('@df.Timestamp>6')

Updated Code Snippet -

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,100, size=(3000, 2)), columns=['Timestamp', 'col1'])
df.eval('@df.Timestamp>6')

dannyi96 avatar Jul 31 '22 18:07 dannyi96

Is this worth adding to the docs? Seems to be a worthwhile snippet

attack68 avatar Aug 05 '22 12:08 attack68