Transformers4Rec icon indicating copy to clipboard operation
Transformers4Rec copied to clipboard

[QST] How to pad a list of lists column?

Open amulyahwr opened this issue 2 years ago • 1 comments

Thank you for the amazing work! I have a following use case:

  1. After applying the groupby operation of nvtabular, I have the following in my workflow Col1 ; Col2 Row1 [1,2,3] ; [[4,5,6], [7,8,9]] Row2 [10,11,12] ; [[13,14,15]]

  2. If I simply apply ListSlice operation with a maximum length of 4. I will receive the following: Col1 ; Col2 Row1 [1,2,3,0] ; [[4,5,6], [7,8,9],0,0] Row2 [10,11,12,0] ; [[13,14,15],0,0,0]

which is incorrect since Col2 becomes mixture of ints and lists. Ideally, it should be like following Col1 ; Col2 Row1 [1,2,3,0] ; [[4,5,6], [7,8,9],[0,0,0],[0,0,0]] Row2 [10,11,12,0] ; [[13,14,15],[0,0,0],[0,0,0],[0,0,0]]

  1. I know I will have to apply Lambda or Custom operator but not sure how. Any help will be really appreciated. Thanks.

amulyahwr avatar Jul 29 '22 16:07 amulyahwr

@amulyahwr thanks for your question. Currently we do not have support for list of lists columns in NVTabular.

if you can share a reproducible simple synthetic dataset and your code snippet we can better understand what you are doing and try to help you.

rnyak avatar Aug 02 '22 18:08 rnyak

@amulyahwr I am closing this issue since we did not hear from you for a while.

rnyak avatar Aug 30 '22 16:08 rnyak