NVTabular icon indicating copy to clipboard operation
NVTabular copied to clipboard

NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.

Results 172 NVTabular issues
Sort by recently updated
recently updated
newest added

**Issue by [benfred](https://github.com/benfred)** _Tuesday Apr 21, 2020 at 17:48 GMT_ _Originally opened as https://github.com/rapidsai/recsys/issues/51_ ---- We're currently only shuffling each chunk, and then writing the shuffled values to partitions in...

enhancement

**Issue by [benfred](https://github.com/benfred)** _Monday May 04, 2020 at 18:43 GMT_ _Originally opened as https://github.com/rapidsai/recsys/issues/87_ ---- We are currently using ```to_pandas / from_pandas``` to spill to host memory in dl_encoder.py Using...

enhancement

**Issue by [benfred](https://github.com/benfred)** _Monday May 04, 2020 at 19:08 GMT_ _Originally opened as https://github.com/rapidsai/recsys/issues/89_ ---- We should default displaying progress information (using tqdm) so that users can view progress when...

**Issue by [alecgunny](https://github.com/alecgunny)** _Monday May 11, 2020 at 16:05 GMT_ _Originally opened as https://github.com/rapidsai/recsys/issues/105_ ---- 1. Add `eps` kwarg to `LogOp` such that the math implemented is `log(x+eps)` instead of...

**Issue by [benfred](https://github.com/benfred)** _Tuesday May 19, 2020 at 18:44 GMT_ _Originally opened as https://github.com/rapidsai/recsys/issues/147_ ---- Add a sample notebook showing how to integrate NVTabular with XGBoost

**Issue by [rnyak](https://github.com/rnyak)** _Thursday May 21, 2020 at 17:35 GMT_ _Originally opened as https://github.com/rapidsai/recsys/issues/163_ ---- **Is your feature request related to a problem? Please describe.** Currently, after applying Groupby operation,...

**Issue by [benfred](https://github.com/benfred)** _Friday May 22, 2020 at 19:51 GMT_ _Originally opened as https://github.com/rapidsai/recsys/issues/167_ ---- We should add support for reading datasets as ORC, in addition to the CSV/Parquet formats...

IO

**Issue by [rnyak](https://github.com/rnyak)** _Tuesday May 26, 2020 at 18:00 GMT_ _Originally opened as https://github.com/rapidsai/recsys/issues/172_ ---- **Is your operator request related to a problem? Please describe.** dropDuplicates() method is used in...

Outbrains

**Issue by [oyilmaz-nvidia](https://github.com/oyilmaz-nvidia)** _Monday Jun 01, 2020 at 15:07 GMT_ _Originally opened as https://github.com/rapidsai/recsys/issues/188_ ---- Right now in nvtabular, we go over all the columns of a dataframe one by...

**Is your feature request related to a problem? Please describe.** The multigpu criteo benchmark is hardcoding the best number of hash partitions for each categorical variable: https://github.com/NVIDIA/NVTabular/blob/2dd4cbc94e074d2a7a319dcf05ff249c7cdec3b3/examples/dask-nvtabular-criteo-benchmark.py#L45-L54 as well as...

MultiGPU