datatable [FR] Option to add a name to grouping in ``by``, especially for boolean expressions

[FR] Option to add a name to grouping in ``by``, especially for boolean expressions

Open samukweku opened this issue 5 years ago • 2 comments

trafficstars

Instead of a default C0, it would be nice to have some relevant name

Example:

from datatable import dt, f, by

grades = [48, 99, 75, 80, 42, 80, 72, 68, 36, 78]
data = {'ID': ["x%d" % r for r in range(10)],
             'Gender': ['F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'M', 'M'],
             'ExamYear': [2007, 2007, 2007, 2008, 2008,
                          2008, 2008, 2009, 2009, 2009],
             'Class': ['algebra', 'stats', 'bio', 'algebra',
                       'algebra', 'stats', 'stats', 'algebra', 'bio', 'bio'],
             'Participated': ['yes', 'yes', 'yes', 'yes', 'no',
                              'yes', 'yes', 'yes', 'yes', 'yes'],
             'Passed': ['yes' if x > 50 else 'no' for x in grades],
             'Employed': [True, True, True, False,
                          False, False, False, True, True, False],
             'Grade': grades}

df = dt.Frame(data)
df[:, dt.mean(f.Grade), by(f.ExamYear < 2009)]

   | C0 | Grade
---+----+---------
 0 |  0 | 60.6667
 1 |  1 | 70.8571

Suggested form:

df[:, dt.mean(f.Grade), by(name = f.ExamYear < 2009)]

Jun 28 '20 06:06 samukweku

@samukweku what if name is not provided (similar to how we do it now), any suggestions what column name to use then?

May 28 '21 20:05 pradkrish

@pradkrish if no name is provided, then we use datatable's form - C0 or C1, ... similar to the example shared above.

May 29 '21 11:05 samukweku

datatable datatable copied to clipboard

[FR] Option to add a name to grouping in ``by``, especially for boolean expressions

Example:

Suggested form:

datatable
datatable copied to clipboard