NVTabular
NVTabular copied to clipboard
[Task] Perfomance Optimization: use parquet writer in dl_encoder
Issue by benfred
Monday May 04, 2020 at 18:43 GMT
Originally opened as https://github.com/rapidsai/recsys/issues/87
We are currently using to_pandas / from_pandas to spill to host memory in dl_encoder.py
Using the parquet writer in cudf seems to be about 3x faster than using pandas - and since cudf now lets you write parquet files to memory we can get a decent performance improvement from using that instead.