woodwork icon indicating copy to clipboard operation
woodwork copied to clipboard

Add List Logical Type

Open gsheni opened this issue 3 years ago • 2 comments

  • When you have NaturalLanguage columns, you might want to calculate the average/median/total word length.
    • To calculate all of those you would need to get the word length (for each word in the row value).
  • We can therefore create a new List Logical Type, that will be a list of integer values, to handle this situation.
class List(LogicalType):
    primary_dtype = 'object'
    standard_tags = {}
                nlp   word_length
1    having 3 words     [6, 1, 5]
5          having 2        [6, 1]
9  having 4 words .  [6, 1, 5, 1]

gsheni avatar Sep 09 '21 15:09 gsheni

Would there ever be lists of floats or strings? Say you had something like "first letter of each word". Maybe we allow some form of stacking of logial types? Like this List logical type takes in a second logical type. It wouldn't have an impact on the dtype, since it'd always be object anyway. But this would give us more info about the contents.

tamargrey avatar Sep 09 '21 15:09 tamargrey

  • That's a good point. Because whether the integer/float/string is a fundamental part of the logical type, I believe we should have 3 Logical Types: ListString, ListInteger, ListDouble.
  • The ListString seems a bit weird. It would be [s, d, b]. I'm not sure we would want to implement that until we find a compelling ML use case

gsheni avatar Sep 09 '21 15:09 gsheni