NVTabular
NVTabular copied to clipboard
[BUG] `LambdaOp` breaks the graph
Describe the bug
In the following example, LambdaOp breaks the ability to create graphs.

Steps/Code to reproduce bug
import nvtabular as nvt
from merlin.core.utils import Distributed
from merlin.models.xgb import XGBoost
from merlin.schema.tags import Tags
import cudf
import numpy as np
products = cudf.DataFrame(
data={'product_id': np.arange(10_000),
'price': np.random.rand(10_000) * 10,
})
products = products.sample(frac=1)
train = products[:8000]
valid = products[2000:]
train_ds = nvt.Dataset(train)
valid_ds = nvt.Dataset(valid)
recommended_status = ['price'] >> nvt.ops.LambdaOp(lambda col: (col < 5).astype(float)) \
>> nvt.ops.Rename(name='recommended')
workflow = nvt.Workflow(['price'] + recommended_status)
train_ds = workflow.fit_transform(train_ds)
valid_ds = workflow.transform(valid_ds)
recommended_status .graph
Expected behavior The graph is outputted correctly and there is no error raised.
Environment details (please complete the following information):
nvcr.io/nvstaging/merlin/merlin-tensorflow:22.06
@radekosmulski I tested the following it works for me. I am closing this bug, you can reopen if you need to. I am using merlin-pytorch:22.07 container.
products = cudf.DataFrame(
data={'product_id': np.arange(10_000),
'price': np.random.rand(10_000) * 10,
})
products = products.sample(frac=1)
train = products[:8000]
valid = products[2000:]
train_ds = nvt.Dataset(train)
valid_ds = nvt.Dataset(valid)
recommended_status = ['price'] >> nvt.ops.LambdaOp(lambda col: (col < 5).astype(float)) >> nvt.ops.Rename(name='recommended')
workflow = nvt.Workflow(['price'] + recommended_status)
train_ds = workflow.fit_transform(train_ds)
valid_ds = workflow.transform(valid_ds)
recommended_status.graph