NVTabular icon indicating copy to clipboard operation
NVTabular copied to clipboard

[BUG] `Dataset` Avro support conflicts with Parquet support (via `cramjam` dep)

Open karlhigley opened this issue 3 years ago • 0 comments

Describe the bug Several of the io tests depend on uavro and fastavro; uavro depends on cramjam. Installing cramjam makes the avro tests pass, but breaks tests that rely on reading Parquet metadata with cudf (for unknown reasons.)

Steps/Code to reproduce bug In the CI container: Run the io tests, note that 8 tests are skipped. pip install uavro fastavro, note that cramjam is installed as a dependency. Run the io tests again, resulting in ~40 failing tests. pip uninstall uavro fastavro and run the tests again, producing the same result. pip uninstall cramjam, run again, tests now pass.

Expected behavior Parquet and Avro support should be compatible with each other, resulting in no skipped or failing tests

Environment details (please complete the following information):

  • Environment location: Docker
  • Method of NVTabular install: Docker
    • sudo docker pull nvcr.io/nvstaging/merlin/merlin_ci_runner:latest
    • sudo docker run --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 -p 3000:3000 --ipc=host --name nvt-tests -v /Projects/nvidia/NVTabular:/nvtabular -p 8000:8000 --gpus '"device=all"' --cap-add SYS_PTRACE --privileged nvcr.io/nvstaging/merlin/merlin_ci_runner:latest /bin/bash

Additional context This issue surfaced while working to move Dataset and related IO functionality to merlin-core, and could be fixed before or after the move.

karlhigley avatar Feb 03 '22 21:02 karlhigley