NVTabular icon indicating copy to clipboard operation
NVTabular copied to clipboard

[BUG] WDL training notebook for HugeCTR processing workflow fails with TypeError

Open Spartee opened this issue 2 years ago • 0 comments

Describe the bug I was running this notebook which uses NVTabular to process the clicks dataset link commonly used as a demo for HugeCTR. When running this notebook, I keep running into a TypeError. Given my knowledge of NVTabular's codebase this has been difficult to debug.


2022-09-14 22:01:31,504 NVTabular processing
2022-09-14 22:01:32,957 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
Traceback (most recent call last):
  File "./preprocess.py", line 418, in <module>
  File "./preprocess.py", line 176, in process_NVT

    (this line)
    features += col0 >> FeatureCross(col1)  >> Rename(postfix="_"+col1) >> cross_cat_op

  File "/usr/local/lib/python3.8/dist-packages/merlin/dag/base_operator.py", line 233, in __rrshift__
    return ColumnSelector(other) >> self
  File "/usr/local/lib/python3.8/dist-packages/merlin/dag/selector.py", line 128, in __rshift__
    return operator.create_node(self) >> operator
  File "/usr/local/lib/python3.8/dist-packages/nvtabular/workflow/node.py", line 30, in __rshift__
    return super().__rshift__(operator)
  File "/usr/local/lib/python3.8/dist-packages/merlin/dag/node.py", line 262, in __rshift__
  File "/usr/local/lib/python3.8/dist-packages/merlin/dag/node.py", line 80, in add_dependency
    dep_node = Node.construct_from(dep)
  File "/usr/local/lib/python3.8/dist-packages/merlin/dag/node.py", line 497, in construct_from
    raise TypeError(
TypeError: Unsupported type: Cannot convert object of type <class 'method'> to Node.

I've narrowed this down to the FeatureCross class which is implemented as a child class of Operator.

FeatureCross Implementation (from notebook)

class FeatureCross(Operator):
    def __init__(self, dependency):
        self.dependency = dependency

    def transform(self, columns, gdf):
        new_df = type(gdf)()
        for col in columns.names:
            new_df[col] = gdf[col] + gdf[self.dependency]
        return new_df

    def dependencies(self):
        return [self.dependency]

It fails on the Node.contruct_from method which seems to expect either a List, Str, or ColumnSelector which makes intuitive sense, but I don't see how the FeatureCross implementation would ever raise anything but a typeerror since it's none of those types.

It has a property of a ColumnSelector but is not one itself (I believe).

Steps/Code to reproduce bug Run the linked notebook above within the environment specified below. (I changed very little besides paths to data)

The notebook in the triton HugeCTR backend repo and the HugeCTR repo both fail with the same error here.

Expected behavior For this notebook to run without error given the environment provided below.

Environment details (please complete the following information):

  • Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]: Docker container
  • Container: nvcr.io/nvidia/merlin/merlin-hugectr
  • Container Version: :22.07
  • Docker Version: 20.10
  • Method of NVTabular install: [conda, Docker, or from source]: Docker
    • If method of install is [Docker], provide docker pull & docker run commands used

Docker steps

# start container (this pulls it too)
sudo docker run -it --name merlin-hugectr-2 --gpus=all --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -v ${PWD}:/models -v ${PWD}:/data/ -w /data/ -p 8888:8888 -p 8000:8000 -p 8001:8001 -p 8002:8002 nvcr.io/nvidia/merlin/merlin-hugectr:22.07

# start jupyterlab
jupyter lab --no-browser --allow-root --ip --port 8888 --NotebookApp.token='hugectr'

Additional context Pinging @EvenOldridge who request this to be placed here.

Spartee avatar Sep 16 '22 23:09 Spartee