NVTabular icon indicating copy to clipboard operation
NVTabular copied to clipboard

[BUG] Unable to access cuDF due to RuntimeError: cuDF failure : Unsupported type_id conversion to cudf

Open mtnt-2022 opened this issue 1 year ago • 8 comments

Describe the bug A clear and concise description of what the bug is. I am trying to run the example code at https://nvidia-merlin.github.io/NVTabular/main/api/ops/categorify.html

import cudf
import nvtabular as nvt

# Create toy dataset
df = cudf.DataFrame({
    'author': ['User_A', 'User_B', 'User_C', 'User_C', 'User_A', 'User_B', 'User_A'],
    'productID': [100, 101, 102, 101, 102, 103, 103],
    'label': [0, 0, 1, 1, 1, 0, 0]
}). # ERROR: RuntimeError: cuDF failure at: /opt/rapids/src/cudf/cpp/src/interop/from_arrow.cu:86: Unsupported type_id conversion to cudf
dataset = nvt.Dataset(df)

# Define pipeline
CATEGORICAL_COLUMNS = ['author', 'productID']
cat_features = CATEGORICAL_COLUMNS >> nvt.ops.Categorify(
    freq_threshold={"author": 3, "productID": 2},
    num_buckets={"author": 10, "productID": 20})


# Initialize the workflow and execute it
proc = nvt.Workflow(cat_features)
proc.fit(dataset)
ddf = proc.transform(dataset).to_ddf()

# Print results
print(ddf.compute())

also, at https://github.com/NVIDIA-Merlin/NVTabular/blob/main/tests/unit/examples/test_02-Advanced-NVTabular-workflow.py I got error for

from merlin.core.compat import cudf

ImportError                               Traceback (most recent call last)
Cell In[12], line 1
----> 1 from merlin.core.compat import cudf

ImportError: cannot import name 'cudf' from 'merlin.core.compat' (/usr/local/lib/python3.8/dist-packages/merlin/core/compat.py)

Expected behavior It should work well.

Environment details (please complete the following information): Platform: Debian 4.19.269-1 Python version: 3.8.10 PyTorch version (GPU?): 2.0.0 (yes support GPU)

  • Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)] GCP

  • Method of NVTabular install: [conda, Docker, or from source] Docker

  • If method of install is [Docker], provide docker pull & docker run commands used I am using nvcr.io/nvidia/merlin/merlin-pytorch:23.02. All cudf libs were installed by GCP by default.

Additional context

cudf : 22.8.0a0+304.g6ca81bbc78.dirty dask-cudf : 22.8.0a0+304.g6ca81bbc78.dirty

CUDA Version: 11.8 NVIDIA-SMI 510.47.03 Driver Version: 510.47.03

merlin 1.9.1 merlin-core 0.5.0 merlin-dataloader 0.0.3 merlin-models 23.2.0 merlin-systems 23.2.0

nvidia-cublas-cu11 11.10.3.66 nvidia-cuda-cupti-cu11 11.7.101 nvidia-cuda-nvrtc-cu11 11.7.99 nvidia-cuda-runtime-cu11 11.7.99 nvidia-cudnn-cu11 8.5.0.96 nvidia-cufft-cu11 10.9.0.58 nvidia-curand-cu11 10.2.10.91 nvidia-cusolver-cu11 11.4.0.1 nvidia-cusparse-cu11 11.7.4.91 nvidia-nccl-cu11 2.14.3 nvidia-nvtx-cu11 11.7.91 nvidia-pyindex 1.0.9 nvtabular 23.2.0

GPU : Tesla T4 Cuda compilation tools, release 11.8, V11.8.89 Build cuda_11.8.r11.8/compiler.31833905_0

triton 2.0.0 tritonclient 2.32.0

Ubuntu 20.04.5 LTS rmm 22.8.0a0+62.gf6bf047.dirty torch 2.0.0

mtnt-2022 avatar Apr 24 '23 22:04 mtnt-2022

It looks like you have an older version of merlin-core. The latest is 23.02.01. Based on when merlin.core.compat was added, I'm fairly confident installing a newer version of merlin-core will resolve the cudf import issue you described.

karlhigley avatar Apr 24 '23 23:04 karlhigley

@karlhigley May I use

FROM nvcr.io/nvidia/merlin/merlin-pytorch:latest

in the docker file so that I can always install the latest one ?

mtnt-2022 avatar Apr 25 '23 00:04 mtnt-2022

@karlhigley , I got a build error:

   FROM nvcr.io/nvidia/merlin/merlin-pytorch:23.02.01. (same error for :latest)
 "Containerize the artifact": manifest for nvcr.io/nvidia/merlin/merlin-pytorch:23.02.01 not found: manifest unknown: manifest unknown"

mtnt-2022 avatar Apr 25 '23 00:04 mtnt-2022

Ah sorry, I meant the latest version of merlin-core is 23.02.01; there's no 23.02.01 container version. The latest version of the Torch container comes with merlin-core 23.2.0 pre-installed, which should be new enough to avoid the merlin.core.compat error you mentioned. Since you have merlin-core 0.5.0, I'm guessing you may have installed one of the Merlin libraries from source, some of which have overly permissive version specifiers and can cause this issue. Using the merlin-pytorch 23.02 container, it should be sufficient to pip install merlin-core after installing any of the other Merlin libraries from source.

karlhigley avatar Apr 25 '23 14:04 karlhigley

@karlhigley , I am using this for the container image

   FROM nvcr.io/nvidia/merlin/merlin-pytorch:nightly

I got:

merlin                                        1.10.0
merlin-core                               23.2.1
merlin-dataloader                    23.2.1
merlin-models                         23.2.0
merlin-systems                        0+untagged.1.ge94d2a9
cuda-python                           11.8.1
cudf                                         22.8.0a0+304.g6ca81bbc78.dirty
cupy-cuda117                          10.6.0

When I run

      import cudf
      # import pandas as pd
print('cuDF Version:', cudf.__version__)

I got:


 ---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[2], line 1
----> 1 import cudf
     2 # import pandas as pd
     3 print('cuDF Version:', cudf.__version__)

File /usr/local/lib/python3.8/dist-packages/cudf/__init__.py:12
     8 from numba import config as numba_config, cuda
    10 import rmm
---> 12 from cudf.api.types import dtype
    13 from cudf import api, core, datasets, testing
    14 from cudf._version import get_versions

File /usr/local/lib/python3.8/dist-packages/cudf/api/__init__.py:3
     1 # Copyright (c) 2021, NVIDIA CORPORATION.
----> 3 from cudf.api import extensions, types
     5 __all__ = ["extensions", "types"]

File /usr/local/lib/python3.8/dist-packages/cudf/api/types.py:18
    15 from pandas.api import types as pd_types
    17 import cudf
---> 18 from cudf.core.dtypes import (  # noqa: F401
    19     _BaseDtype,
    20     dtype,
    21     is_categorical_dtype,
    22     is_decimal32_dtype,
    23     is_decimal64_dtype,
    24     is_decimal128_dtype,
    25     is_decimal_dtype,
    26     is_interval_dtype,
    27     is_list_dtype,
    28     is_struct_dtype,
    29 )
    32 def is_numeric_dtype(obj):
    33     """Check whether the provided array or dtype is of a numeric dtype.
    34 
    35     Parameters
  (...)
    43         Whether or not the array or dtype is of a numeric dtype.
    44     """

File /usr/local/lib/python3.8/dist-packages/cudf/core/dtypes.py:13
    11 from pandas.api import types as pd_types
    12 from pandas.api.extensions import ExtensionDtype
---> 13 from pandas.core.arrays._arrow_utils import ArrowIntervalType
    14 from pandas.core.dtypes.dtypes import (
    15     CategoricalDtype as pd_CategoricalDtype,
    16     CategoricalDtypeType as pd_CategoricalDtypeType,
    17 )
    19 import cudf

ModuleNotFoundError: No module named 'pandas.core.arrays._arrow_utils'

mtnt-2022 avatar Apr 25 '23 17:04 mtnt-2022

You can build an image that way, but we don't generally guarantee the stability of the nightly images. Are you seeing the same issue building

FROM nvcr.io/nvidia/merlin/merlin-pytorch:23.02

?

karlhigley avatar Apr 25 '23 17:04 karlhigley

@karlhigley , yes, I got the same error for

 FROM nvcr.io/nvidia/merlin/merlin-pytorch:23.02

mtnt-2022 avatar Apr 25 '23 18:04 mtnt-2022

@jperez999 Are there known version incompatibility issues between Pandas and cuDF that might explain this?

karlhigley avatar Apr 27 '23 14:04 karlhigley