rnyak issues

Results 58 issues of


                                            rnyak

[BUG] Getting error when jointly encoding single-hot and multi-hot categ columns

**Describe the bug** I would like to jointly encode single and multi-hot categorical columns but I am getting the following error: ``` --------------------------------------------------------------------------- TypeError Traceback (most recent call last) Input...

bug

[BUG] Categorify(start_index) is not generating the mappings in the unique parquet files as expected

**Describe the bug** I noticed that when we use the start_index in the Categorify op, the generated unique category parquet files are not correctly mapping the original, encoded and null...

bug

[FEA] Add more meaningful error message when we the dataset folder is wrong

**Is your feature request related to a problem? Please describe.** When we set the DATA_FOLDER wrong in our NVT workflow, we get the following error as an Index error, but...

bug

documentation

[BUG] Groupby op does not add `list` tag in the schema automatically after list aggregation

**Describe the bug** I use Groupby op in my NVT pipeline. I was expecting to get `list` tag automatically added in the schema file, but that's not the case. **Steps/Code...

bug

[BUG] Groupby op does not respect converted dtypes by LambdaOP when there are null entires in the columns

**Describe the bug** I am converting float dtypes to `int64` with LambdaOP and then I add Groupby op to generate list columns. However, I noticed that, the final dtypes are...

bug

[DOC] Improve Setting the Row Group Size for the Parquet Files section in docs

## Report incorrect documentation **Location of incorrect documentation** We have [troubleshooting])(https://nvidia-merlin.github.io/NVTabular/main/resources/troubleshooting.html#setting-the-row-group-size-for-the-parquet-files) section in NVT docs page to guide users to avoid memory issues when running NVT or even training a...

[FEA] Add `row_group_size` param to `to_parquet()` method

**Is your feature request related to a problem? Please describe.** `row_group_size` would be a useful argument to be added to `to_parquet()` method when we save processed files from NVT workflow...

[BUG] get_embedding_sizes generates wrong embedding shape with encode_type = 'joint'

**Describe the bug** When we jointly encode categorical columns, `nvt.ops.get_embedding_sizes(workflow)` does not generate the correct embedding table. **Steps/Code to reproduce bug** ``` df = cudf.DataFrame({'a_user_id': ["User_A","User_E","User_B","User_C","User_A","User_B","User_B","User_C","User_B","User_A"], 'b_user_id': ["User_B", "User_F", "User_D",...

bug

[FEA] Create PyTorch inference notebook example and add to test_notebooks.py

**Is your feature request related to a problem? Please describe.** Currently, we do not have a PyT inference example in the `test_notebooks.py` so we cannot capture pytorch inference errors. We...

[WIP] Add filtering step to the multi-stage recsys building and deployment notebooks

Currently PoC notebook does not have filtering step this PR tries to add this with a hacky workaround. However I get an error (see below) when I try to export...

enhancement

help wanted

example