[Task] Perfomance Optimization: use parquet writer in dl_encoder

Open benfred opened this issue 5 years ago • 0 comments

Issue by benfred Monday May 04, 2020 at 18:43 GMT Originally opened as https://github.com/rapidsai/recsys/issues/87

We are currently using to_pandas / from_pandas to spill to host memory in dl_encoder.py

Using the parquet writer in cudf seems to be about 3x faster than using pandas - and since cudf now lets you write parquet files to memory we can get a decent performance improvement from using that instead.

Jun 04 '20 23:06 benfred