pymc
pymc copied to clipboard
Add `_as_tensor_variable` converter for pandas objects
Description of your problem
Aesara can not convert pandas dataframes into tensors. @twiecki suggested me to open an issue about it here :)
Please provide a minimal, self-contained, and reproducible example.
import aesara.tensor as at
import pandas as pd
df = pd.DataFrame(data={"col": [1.0, 2.0, -1.0]})
at.as_tensor(x=df, name="df")
Please provide the full traceback.
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
/Users/juanitorduz/Documents/offline_channels_marketing_efficiency/JPN/notebooks/geo-tests/pre_test_analysis.ipynb Cell 42' in <cell line: 6>()
[2](vscode-notebook-cell:/Users/juanitorduz/Documents/offline_channels_marketing_efficiency/JPN/notebooks/geo-tests/pre_test_analysis.ipynb#ch0000042?line=1) import pandas as pd
[4](vscode-notebook-cell:/Users/juanitorduz/Documents/offline_channels_marketing_efficiency/JPN/notebooks/geo-tests/pre_test_analysis.ipynb#ch0000042?line=3) df = pd.DataFrame(data={"col": [1.0, 2.0, -1.0]})
----> [6](vscode-notebook-cell:/Users/juanitorduz/Documents/offline_channels_marketing_efficiency/JPN/notebooks/geo-tests/pre_test_analysis.ipynb#ch0000042?line=5) at.as_tensor(x=df, name="df")
File ~/.local/share/virtualenvs/offline_channels_marketing_efficiency-hd9wNcdu/lib/python3.8/site-packages/aesara/tensor/__init__.py:42, in as_tensor_variable(x, name, ndim, **kwargs)
10 def as_tensor_variable(
11 x: Any, name: Optional[str] = None, ndim: Optional[int] = None, **kwargs
12 ) -> "TensorVariable":
13 """Convert `x` into an equivalent `TensorVariable`.
14
15 This function can be used to turn ndarrays, numbers, `ScalarType` instances,
(...)
40
41 """
---> 42 return _as_tensor_variable(x, name, ndim, **kwargs)
File ~/opt/anaconda3/lib/python3.8/functools.py:875, in singledispatch.<locals>.wrapper(*args, **kw)
871 if not args:
872 raise TypeError(f'{funcname} requires at least '
873 '1 positional argument')
--> 875 return dispatch(args[0].__class__)(*args, **kw)
File ~/.local/share/virtualenvs/offline_channels_marketing_efficiency-hd9wNcdu/lib/python3.8/site-packages/aesara/tensor/__init__.py:49, in _as_tensor_variable(x, name, ndim, **kwargs)
45 @singledispatch
46 def _as_tensor_variable(
47 x, name: Optional[str], ndim: Optional[int], **kwargs
48 ) -> "TensorVariable":
---> 49 raise NotImplementedError(f"Cannot convert {x} to a tensor variable.")
NotImplementedError: Cannot convert col
0 1.0
1 2.0
2 -1.0 to a tensor variable.
If I convert to numpy it works as expected:
at.as_tensor(x=df.to_numpy(), name="df")
Please provide any additional information below.
Versions and main components
- PyMC/PyMC3 Version:
pymc==4.0.0 - Aesara/Theano Version:
aesara==2.6.6 - Python Version: 3.8
- Operating system: MacOS
- How did you install PyMC/PyMC3: (conda/pip) Pipenv
Related to https://github.com/aesara-devs/aesara/discussions/912
This (example) is not a PyMC issue as it is using a pure Aesara constructor. But since the as_tensor_variable constructor works via dispatching we can add one method either here or in Aesara.
@brandonwillard what would be preferable? Pandas is quite ubiquitous that we might want to support it upstream.
No, we can't extend Aesara's reach to cover Pandas. We can make it easier for another package to support it, but we're not going to add it as a requirement, or deal with its details in the Aesara project. We need to diligently avoid that kind of scope creep.
Alright so in the meantime we can implement an _as_tensor_variable dispatch here for Pandas objects. Something like this might suffice:
try:
import pandas as pd
@_as_tensor_variable.register(pd.DataFrame)
def dataframe_to_tensor_variable(df, *args, **kwargs):
return at.as_tensor_variable(df.to_numpy(), *args, **kwargs)
except ModuleNotFoundError:
pass
Would we do that in PyMC or in Aesara?
Would we do that in PyMC or in Aesara?
PyMC from the discussion above
@ricardoV94 Great, should we move this over there then as a feature request?
Should stay here in PyMC if there is no pressure from other libraries to have it at the Aesara level
Slightly related: #4650
pm.aesaraf.convert_observed_data is the allrounder conversion function that is used in PyMC to convert anonymous data types including pandas objects to types supported by Aesara.
Slightly related: #4650
pm.aesaraf.convert_observed_datais the allrounder conversion function that is used in PyMC to convert anonymous data types includingpandasobjects to types supported by Aesara.
That's a bit different though, because it has special logic for masks
Alright so in the meantime we can implement an
_as_tensor_variabledispatch here for Pandas objects. Something like this might suffice:try: import pandas as pd @_as_tensor_variable.register(pd.DataFrame) def dataframe_to_tensor_variable(df, *args, **kwargs): return at.as_tensor_variable(df.to_numpy(), *args, **kwargs) except ModuleNotFoundError: pass
@ricardoV94 based on your discussion this seems a beginner friendly task right? I can give it a try! where should your suggestion above should be included?
@juanitorduz probably in pymc/aesaraf.py
Closed via #5920