NVTabular icon indicating copy to clipboard operation
NVTabular copied to clipboard

[BUG] Categorify combo doesnt work on list columns

Open bschifferer opened this issue 3 years ago • 1 comments

Describe the bug As a user, I want to jointly Categorify two columns, one is list and one is normal. Usecase - I have items interacted and one is the current item to predict and the list feature are the historic ones.

Error:

df = cudf.DataFrame({
    'col1': [0,1,2,3,4,5],
    'col2': [[0,1],[1,2],[2,3],[3,4],[4],[5]]
})
dataset = nvt.Dataset(df)
cols = [['col1', 'col2']] >> nvt.ops.Categorify()
workflow = nvt.Workflow(cols)
workflow.fit(dataset)
workflow.transform(dataset).to_ddf().compute()

Error:

File /usr/local/lib/python3.8/dist-packages/pandas/core/dtypes/common.py:1619, in _is_dtype_type(arr_or_dtype, condition)
   1615         return condition(type(None))
   1617     return False
-> 1619 return condition(tipo)

File /usr/local/lib/python3.8/dist-packages/pandas/core/dtypes/common.py:146, in classes.<locals>.<lambda>(tipo)
    144 def classes(*klasses) -> Callable:
    145     """evaluate if the tipo is a subclass of the klasses"""
--> 146     return lambda tipo: issubclass(tipo, klasses)

TypeError: issubclass() arg 1 must be a class

What works: No joint categorify

import cudf
import nvtabular as nvt

df = cudf.DataFrame({
    'col1': [0,1,2,3,4,5],
    'col2': [[0,1],[1,2],[2,3],[3,4],[4],[5]]
})
dataset = nvt.Dataset(df)
cols = ['col1', 'col2'] >> nvt.ops.Categorify()
workflow = nvt.Workflow(cols)
workflow.fit(dataset)
workflow.transform(dataset).to_ddf().compute()

Joint Categoriy with non-list columns

import cudf
import nvtabular as nvt

df = cudf.DataFrame({
    'col1': [0,1,2,3,4,5],
    'col2':  [1,2,3,4,4,5],
})
dataset = nvt.Dataset(df)
cols = [['col1', 'col2']] >> nvt.ops.Categorify()
workflow = nvt.Workflow(cols)
workflow.fit(dataset)
workflow.transform(dataset).to_ddf().compute()```

bschifferer avatar Sep 09 '22 07:09 bschifferer

@rjzamora hello. is this something you can take a look? thanks.

rnyak avatar Sep 12 '22 16:09 rnyak

@rjzamora hello. is this something you can take a look? thanks.

Sorry for the delay - I can look into this.

rjzamora avatar Sep 30 '22 19:09 rjzamora