[BUG] `Dataset` Avro support conflicts with Parquet support (via `cramjam` dep)
Describe the bug
Several of the io tests depend on uavro and fastavro; uavro depends on cramjam. Installing cramjam makes the avro tests pass, but breaks tests that rely on reading Parquet metadata with cudf (for unknown reasons.)
Steps/Code to reproduce bug
In the CI container: Run the io tests, note that 8 tests are skipped. pip install uavro fastavro, note that cramjam is installed as a dependency. Run the io tests again, resulting in ~40 failing tests. pip uninstall uavro fastavro and run the tests again, producing the same result. pip uninstall cramjam, run again, tests now pass.
Expected behavior Parquet and Avro support should be compatible with each other, resulting in no skipped or failing tests
Environment details (please complete the following information):
- Environment location: Docker
- Method of NVTabular install: Docker
sudo docker pull nvcr.io/nvstaging/merlin/merlin_ci_runner:latestsudo docker run --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 -p 3000:3000 --ipc=host --name nvt-tests -v /Projects/nvidia/NVTabular:/nvtabular -p 8000:8000 --gpus '"device=all"' --cap-add SYS_PTRACE --privileged nvcr.io/nvstaging/merlin/merlin_ci_runner:latest /bin/bash
Additional context
This issue surfaced while working to move Dataset and related IO functionality to merlin-core, and could be fixed before or after the move.