NVTabular
NVTabular copied to clipboard
[BUG] Categorify combo doesnt work on list columns
Describe the bug As a user, I want to jointly Categorify two columns, one is list and one is normal. Usecase - I have items interacted and one is the current item to predict and the list feature are the historic ones.
Error:
df = cudf.DataFrame({
'col1': [0,1,2,3,4,5],
'col2': [[0,1],[1,2],[2,3],[3,4],[4],[5]]
})
dataset = nvt.Dataset(df)
cols = [['col1', 'col2']] >> nvt.ops.Categorify()
workflow = nvt.Workflow(cols)
workflow.fit(dataset)
workflow.transform(dataset).to_ddf().compute()
Error:
File /usr/local/lib/python3.8/dist-packages/pandas/core/dtypes/common.py:1619, in _is_dtype_type(arr_or_dtype, condition)
1615 return condition(type(None))
1617 return False
-> 1619 return condition(tipo)
File /usr/local/lib/python3.8/dist-packages/pandas/core/dtypes/common.py:146, in classes.<locals>.<lambda>(tipo)
144 def classes(*klasses) -> Callable:
145 """evaluate if the tipo is a subclass of the klasses"""
--> 146 return lambda tipo: issubclass(tipo, klasses)
TypeError: issubclass() arg 1 must be a class
What works: No joint categorify
import cudf
import nvtabular as nvt
df = cudf.DataFrame({
'col1': [0,1,2,3,4,5],
'col2': [[0,1],[1,2],[2,3],[3,4],[4],[5]]
})
dataset = nvt.Dataset(df)
cols = ['col1', 'col2'] >> nvt.ops.Categorify()
workflow = nvt.Workflow(cols)
workflow.fit(dataset)
workflow.transform(dataset).to_ddf().compute()
Joint Categoriy with non-list columns
import cudf
import nvtabular as nvt
df = cudf.DataFrame({
'col1': [0,1,2,3,4,5],
'col2': [1,2,3,4,4,5],
})
dataset = nvt.Dataset(df)
cols = [['col1', 'col2']] >> nvt.ops.Categorify()
workflow = nvt.Workflow(cols)
workflow.fit(dataset)
workflow.transform(dataset).to_ddf().compute()```
@rjzamora hello. is this something you can take a look? thanks.
@rjzamora hello. is this something you can take a look? thanks.
Sorry for the delay - I can look into this.