[BUG] TargetEncoding requires the original `target` column

Open radekosmulski opened this issue 3 years ago • 0 comments

Describe the bug Target encoding relies on a target column being present even if we perform a transform operation.

Steps/Code to reproduce bug Here are the screenshots (I provide the code below)

Code:

out = ['cats'] >> nvt.ops.TargetEncoding('target', kfold=1)

ds = nvt.Dataset(df)
wf = nvt.Workflow(out)

o = wf.fit_transform(ds).compute()

o

test = cudf.DataFrame(data={
    'cats': list('abbcc')
})
test

wf.transform(nvt.Dataset(test))

test = cudf.DataFrame(data={
    'cats': list('abbcc')
})
test['target'] = 0
test

wf.transform(nvt.Dataset(test)).compute()

Expected behavior Transform should not rely on the target column being present in the dataset. Providing a dummy column works, but should not be required.

Jun 14 '22 07:06 radekosmulski