NVTabular
NVTabular copied to clipboard
[BUG] Dtype discrepancy with pandas and groupby on CPU
Describe the bug
Steps/Code to reproduce bug
- Run notebook https://github.com/NVIDIA-Merlin/Transformers4Rec/blob/v23.02.00/examples/getting-started-session-based/01-ETL-with-NVTabular.ipynb
- In CPU-only environmennt
TypeError: Dtype discrepancy detected for column age_days-list: operator Groupby reported dtype `DType(name='float32', element_type=<ElementType.Float: 'float'>, element_size=32, element_unit=None, signed=True, shape=Shape(dims=None))` but returned dtype `DType(name='float64', element_type=<ElementType.Float: 'float'>, element_size=64, element_unit=None, signed=True, shape=Shape(dims=None))`.
Expected behavior
No exception raised, and output matching equivalent result when running on GPU with cudf
Environment details:
- Environment location: Docker
- Method of NVTabular install: from source
Additional context
A similar issue has been reported recently #1767 . However that particular example is now working following a change in core https://github.com/NVIDIA-Merlin/core/pull/226
@oliverholworthy I ran on the 23.04 pytorch container without GPU and it ran without error. Is this error only apparent when installing NVTabular from source? Or was it corrected with changes in core also?