NVTabular icon indicating copy to clipboard operation
NVTabular copied to clipboard

NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.

Results 172 NVTabular issues
Sort by recently updated
recently updated
newest added

**Is your feature request related to a problem? Please describe.** `row_group_size` would be a useful argument to be added to `to_parquet()` method when we save processed files from NVT workflow...

**Describe the bug** cuDF failure processing full Criteo dataset, when the parquet files were exported from big query. CSV and parquet from GCS works. **Steps/Code to reproduce bug** https://gist.github.com/mengdong/d6a24fc266d9806ccd74cd9890b67c6a replace...

bug
P1

**Describe the bug** Several of the io tests depend on `uavro` and `fastavro`; `uavro` depends on `cramjam`. Installing `cramjam` makes the avro tests pass, but breaks tests that rely on...

bug
P1

Hi all, In tutorial 03-Training-with-TF (version 0.5.2) , when I run this instruction 'x_emb_output = emb_layer(inputs)' . I see my GPU memory usage is ~7GB (~100% memory usage). I haven't...

bug
P1

**Describe the bug** When we jointly encode categorical columns, `nvt.ops.get_embedding_sizes(workflow)` does not generate the correct embedding table. **Steps/Code to reproduce bug** ``` df = cudf.DataFrame({'a_user_id': ["User_A","User_E","User_B","User_C","User_A","User_B","User_B","User_C","User_B","User_A"], 'b_user_id': ["User_B", "User_F", "User_D",...

bug
P1

- added container info to getting started and scaling criteo examples We discussed in example meeting to clean up NVTabular examples. I started to remove some of them in this...

## Report needed documentation **Report needed documentation** Hi, I would like to include a vector feature (e.g., a pre-trained embedding for items or users) as one of the input features....

examples

## Report incorrect documentation Inline comment list supported ops as ["count", "sum", "mean", "std", "var"]. tree_width : dict or int, optional In the code, I can see supported ops are...

**Describe the bug** The `ListSlice()` op is currently able to pad the sequences on the right only. But there are use cases where you need to pad on the left....

enhancement
P1

Slicing sequence of past interactions can vary between rows based on a specified rule (such as taking 80% of the original sequence, taking interactions from the last hour.) **Describe the...

research