pymc icon indicating copy to clipboard operation
pymc copied to clipboard

Add `_as_tensor_variable` converter for pandas objects

Open juanitorduz opened this issue 3 years ago • 12 comments

Description of your problem

Aesara can not convert pandas dataframes into tensors. @twiecki suggested me to open an issue about it here :)

Please provide a minimal, self-contained, and reproducible example.

import aesara.tensor as at
import pandas as pd

df = pd.DataFrame(data={"col": [1.0, 2.0, -1.0]})

at.as_tensor(x=df, name="df")

Please provide the full traceback.

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
/Users/juanitorduz/Documents/offline_channels_marketing_efficiency/JPN/notebooks/geo-tests/pre_test_analysis.ipynb Cell 42' in <cell line: 6>()
      [2](vscode-notebook-cell:/Users/juanitorduz/Documents/offline_channels_marketing_efficiency/JPN/notebooks/geo-tests/pre_test_analysis.ipynb#ch0000042?line=1) import pandas as pd
      [4](vscode-notebook-cell:/Users/juanitorduz/Documents/offline_channels_marketing_efficiency/JPN/notebooks/geo-tests/pre_test_analysis.ipynb#ch0000042?line=3) df = pd.DataFrame(data={"col": [1.0, 2.0, -1.0]})
----> [6](vscode-notebook-cell:/Users/juanitorduz/Documents/offline_channels_marketing_efficiency/JPN/notebooks/geo-tests/pre_test_analysis.ipynb#ch0000042?line=5) at.as_tensor(x=df, name="df")

File ~/.local/share/virtualenvs/offline_channels_marketing_efficiency-hd9wNcdu/lib/python3.8/site-packages/aesara/tensor/__init__.py:42, in as_tensor_variable(x, name, ndim, **kwargs)
     10 def as_tensor_variable(
     11     x: Any, name: Optional[str] = None, ndim: Optional[int] = None, **kwargs
     12 ) -> "TensorVariable":
     13     """Convert `x` into an equivalent `TensorVariable`.
     14 
     15     This function can be used to turn ndarrays, numbers, `ScalarType` instances,
   (...)
     40 
     41     """
---> 42     return _as_tensor_variable(x, name, ndim, **kwargs)

File ~/opt/anaconda3/lib/python3.8/functools.py:875, in singledispatch.<locals>.wrapper(*args, **kw)
    871 if not args:
    872     raise TypeError(f'{funcname} requires at least '
    873                     '1 positional argument')
--> 875 return dispatch(args[0].__class__)(*args, **kw)

File ~/.local/share/virtualenvs/offline_channels_marketing_efficiency-hd9wNcdu/lib/python3.8/site-packages/aesara/tensor/__init__.py:49, in _as_tensor_variable(x, name, ndim, **kwargs)
     45 @singledispatch
     46 def _as_tensor_variable(
     47     x, name: Optional[str], ndim: Optional[int], **kwargs
     48 ) -> "TensorVariable":
---> 49     raise NotImplementedError(f"Cannot convert {x} to a tensor variable.")

NotImplementedError: Cannot convert    col
0  1.0
1  2.0
2 -1.0 to a tensor variable.

If I convert to numpy it works as expected:

at.as_tensor(x=df.to_numpy(), name="df")

Please provide any additional information below.

Versions and main components

  • PyMC/PyMC3 Version: pymc==4.0.0
  • Aesara/Theano Version: aesara==2.6.6
  • Python Version: 3.8
  • Operating system: MacOS
  • How did you install PyMC/PyMC3: (conda/pip) Pipenv

juanitorduz avatar Jun 10 '22 09:06 juanitorduz

Related to https://github.com/aesara-devs/aesara/discussions/912

This (example) is not a PyMC issue as it is using a pure Aesara constructor. But since the as_tensor_variable constructor works via dispatching we can add one method either here or in Aesara.

@brandonwillard what would be preferable? Pandas is quite ubiquitous that we might want to support it upstream.

ricardoV94 avatar Jun 10 '22 09:06 ricardoV94

No, we can't extend Aesara's reach to cover Pandas. We can make it easier for another package to support it, but we're not going to add it as a requirement, or deal with its details in the Aesara project. We need to diligently avoid that kind of scope creep.

brandonwillard avatar Jun 10 '22 09:06 brandonwillard

Related to aesara-devs/aesara#912

This is definitely the best general approach.

brandonwillard avatar Jun 10 '22 09:06 brandonwillard

Alright so in the meantime we can implement an _as_tensor_variable dispatch here for Pandas objects. Something like this might suffice:

try:
  import pandas as pd

  @_as_tensor_variable.register(pd.DataFrame)
  def dataframe_to_tensor_variable(df, *args, **kwargs):
    return at.as_tensor_variable(df.to_numpy(), *args, **kwargs)

except ModuleNotFoundError:
  pass

ricardoV94 avatar Jun 10 '22 09:06 ricardoV94

Would we do that in PyMC or in Aesara?

twiecki avatar Jun 10 '22 09:06 twiecki

Would we do that in PyMC or in Aesara?

PyMC from the discussion above

ricardoV94 avatar Jun 10 '22 09:06 ricardoV94

@ricardoV94 Great, should we move this over there then as a feature request?

twiecki avatar Jun 10 '22 09:06 twiecki

Should stay here in PyMC if there is no pressure from other libraries to have it at the Aesara level

ricardoV94 avatar Jun 10 '22 10:06 ricardoV94

Slightly related: #4650

pm.aesaraf.convert_observed_data is the allrounder conversion function that is used in PyMC to convert anonymous data types including pandas objects to types supported by Aesara.

michaelosthege avatar Jun 10 '22 12:06 michaelosthege

Slightly related: #4650

pm.aesaraf.convert_observed_data is the allrounder conversion function that is used in PyMC to convert anonymous data types including pandas objects to types supported by Aesara.

That's a bit different though, because it has special logic for masks

ricardoV94 avatar Jun 10 '22 14:06 ricardoV94

Alright so in the meantime we can implement an _as_tensor_variable dispatch here for Pandas objects. Something like this might suffice:

try:
  import pandas as pd

  @_as_tensor_variable.register(pd.DataFrame)
  def dataframe_to_tensor_variable(df, *args, **kwargs):
    return at.as_tensor_variable(df.to_numpy(), *args, **kwargs)

except ModuleNotFoundError:
  pass

@ricardoV94 based on your discussion this seems a beginner friendly task right? I can give it a try! where should your suggestion above should be included?

juanitorduz avatar Jun 22 '22 09:06 juanitorduz

@juanitorduz probably in pymc/aesaraf.py

michaelosthege avatar Jun 22 '22 10:06 michaelosthege

Closed via #5920

ricardoV94 avatar Aug 25 '22 19:08 ricardoV94