NVTabular
NVTabular copied to clipboard
NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.
**Issue by [benfred](https://github.com/benfred)** _Tuesday Apr 21, 2020 at 17:48 GMT_ _Originally opened as https://github.com/rapidsai/recsys/issues/51_ ---- We're currently only shuffling each chunk, and then writing the shuffled values to partitions in...
**Issue by [benfred](https://github.com/benfred)** _Monday May 04, 2020 at 18:43 GMT_ _Originally opened as https://github.com/rapidsai/recsys/issues/87_ ---- We are currently using ```to_pandas / from_pandas``` to spill to host memory in dl_encoder.py Using...
**Issue by [benfred](https://github.com/benfred)** _Monday May 04, 2020 at 19:08 GMT_ _Originally opened as https://github.com/rapidsai/recsys/issues/89_ ---- We should default displaying progress information (using tqdm) so that users can view progress when...
**Issue by [alecgunny](https://github.com/alecgunny)** _Monday May 11, 2020 at 16:05 GMT_ _Originally opened as https://github.com/rapidsai/recsys/issues/105_ ---- 1. Add `eps` kwarg to `LogOp` such that the math implemented is `log(x+eps)` instead of...
**Issue by [benfred](https://github.com/benfred)** _Tuesday May 19, 2020 at 18:44 GMT_ _Originally opened as https://github.com/rapidsai/recsys/issues/147_ ---- Add a sample notebook showing how to integrate NVTabular with XGBoost
**Issue by [rnyak](https://github.com/rnyak)** _Thursday May 21, 2020 at 17:35 GMT_ _Originally opened as https://github.com/rapidsai/recsys/issues/163_ ---- **Is your feature request related to a problem? Please describe.** Currently, after applying Groupby operation,...
**Issue by [benfred](https://github.com/benfred)** _Friday May 22, 2020 at 19:51 GMT_ _Originally opened as https://github.com/rapidsai/recsys/issues/167_ ---- We should add support for reading datasets as ORC, in addition to the CSV/Parquet formats...
**Issue by [rnyak](https://github.com/rnyak)** _Tuesday May 26, 2020 at 18:00 GMT_ _Originally opened as https://github.com/rapidsai/recsys/issues/172_ ---- **Is your operator request related to a problem? Please describe.** dropDuplicates() method is used in...
**Issue by [oyilmaz-nvidia](https://github.com/oyilmaz-nvidia)** _Monday Jun 01, 2020 at 15:07 GMT_ _Originally opened as https://github.com/rapidsai/recsys/issues/188_ ---- Right now in nvtabular, we go over all the columns of a dataframe one by...
**Is your feature request related to a problem? Please describe.** The multigpu criteo benchmark is hardcoding the best number of hash partitions for each categorical variable: https://github.com/NVIDIA/NVTabular/blob/2dd4cbc94e074d2a7a319dcf05ff249c7cdec3b3/examples/dask-nvtabular-criteo-benchmark.py#L45-L54 as well as...