NVTabular issues

[FEA] Implement full shuffle

**Issue by [benfred](https://github.com/benfred)** _Tuesday Apr 21, 2020 at 17:48 GMT_ _Originally opened as https://github.com/rapidsai/recsys/issues/51_ ---- We're currently only shuffling each chunk, and then writing the shuffled values to partitions in...

benfred

enhancement

[Task] Perfomance Optimization: use parquet writer in dl_encoder

**Issue by [benfred](https://github.com/benfred)** _Monday May 04, 2020 at 18:43 GMT_ _Originally opened as https://github.com/rapidsai/recsys/issues/87_ ---- We are currently using ```to_pandas / from_pandas``` to spill to host memory in dl_encoder.py Using...

benfred

enhancement

[FEA] Add progress bars

1

**Issue by [benfred](https://github.com/benfred)** _Monday May 04, 2020 at 19:08 GMT_ _Originally opened as https://github.com/rapidsai/recsys/issues/89_ ---- We should default displaying progress information (using tqdm) so that users can view progress when...

benfred

[FEA] epsilon kwarg for LogOp and possible rename

2

**Issue by [alecgunny](https://github.com/alecgunny)** _Monday May 11, 2020 at 16:05 GMT_ _Originally opened as https://github.com/rapidsai/recsys/issues/105_ ---- 1. Add `eps` kwarg to `LogOp` such that the math implemented is `log(x+eps)` instead of...

benfred

[FEA] XGBoost integration

**Issue by [benfred](https://github.com/benfred)** _Tuesday May 19, 2020 at 18:44 GMT_ _Originally opened as https://github.com/rapidsai/recsys/issues/147_ ---- Add a sample notebook showing how to integrate NVTabular with XGBoost

benfred

[FEA] make merge operation optional after Groupby

2

**Issue by [rnyak](https://github.com/rnyak)** _Thursday May 21, 2020 at 17:35 GMT_ _Originally opened as https://github.com/rapidsai/recsys/issues/163_ ---- **Is your feature request related to a problem? Please describe.** Currently, after applying Groupby operation,...

benfred

[FEA] ORC file format support

**Issue by [benfred](https://github.com/benfred)** _Friday May 22, 2020 at 19:51 GMT_ _Originally opened as https://github.com/rapidsai/recsys/issues/167_ ---- We should add support for reading datasets as ORC, in addition to the CSV/Parquet formats...

benfred

IO

[OP] Add dropDuplicates() operator

2

**Issue by [rnyak](https://github.com/rnyak)** _Tuesday May 26, 2020 at 18:00 GMT_ _Originally opened as https://github.com/rapidsai/recsys/issues/172_ ---- **Is your operator request related to a problem? Please describe.** dropDuplicates() method is used in...

benfred

Outbrains

[Task] Using DataFrame level sum, count, min, max, std, var functions in Ops

**Issue by [oyilmaz-nvidia](https://github.com/oyilmaz-nvidia)** _Monday Jun 01, 2020 at 15:07 GMT_ _Originally opened as https://github.com/rapidsai/recsys/issues/188_ ---- Right now in nvtabular, we go over all the columns of a dataframe one by...

benfred

[FEA] Automatically calculate appropiate number of hash partitions

1

**Is your feature request related to a problem? Please describe.** The multigpu criteo benchmark is hardcoding the best number of hash partitions for each categorical variable: https://github.com/NVIDIA/NVTabular/blob/2dd4cbc94e074d2a7a319dcf05ff249c7cdec3b3/examples/dask-nvtabular-criteo-benchmark.py#L45-L54 as well as...

benfred

MultiGPU

NVTabular
NVTabular copied to clipboard

Metadata

[FEA] Implement full shuffle

[Task] Perfomance Optimization: use parquet writer in dl_encoder

[FEA] Add progress bars

[FEA] epsilon kwarg for LogOp and possible rename

[FEA] XGBoost integration

[FEA] make merge operation optional after Groupby

[FEA] ORC file format support

[OP] Add dropDuplicates() operator

[Task] Using DataFrame level sum, count, min, max, std, var functions in Ops

[FEA] Automatically calculate appropiate number of hash partitions

← Metadata

Owner

Metadata

NVTabular NVTabular copied to clipboard

Metadata

← Metadata

Owner

Metadata

NVTabular
NVTabular copied to clipboard