datatable icon indicating copy to clipboard operation
datatable copied to clipboard

Loss of column Name

Open samukweku opened this issue 5 years ago • 8 comments

  • Did you find a bug in datatable, or maybe the bug found you? Loss of column names during some operations. What determines how a column name is changed? What operations will cause loss of column names?

  • How to reproduce the bug?

# sample data 
data = {"id":[1,1,1,1,2,2,1,2,1],
      "code":range(10, 1, -1),
      'valA':range(1,10),
      'valB':range(10,19)}

DT = dt.Frame(data)

 id code valA valB
0 1 10 1 10
1 1 9 2 11
2 1 8 3 12
3 1 7 4 13
4 2 6 5 14
5 2 5 6 15
6 1 4 7 16
7 2 3 8 17
8 1 2 9 18

# apply an operation
DT[[:, -1 * f[:]]


   C0 C1 C2 C3
0 −1 −10 −1 −10
1 −1 −9 −2 −11
2 −1 −8 −3 −12
3 −1 −7 −4 −13
4 −2 −6 −5 −14
5 −2 −5 −6 −15
6 −1 −4 −7 −16
7 −2 −3 −8 −17
8 −1 −2 −9 −18
9 rows × 4 columns
  • What was the expected behavior? You should not lose column names, even if it is applied to the whole dataframe. Some clarity on why column names are lost will be helpful, and what conditions cause loss of column names.

  • Your environment? Python version : '3.8.5 | packaged by conda-forge | (default, Sep 24 2020, 16:55:52) \n[GCC 7.5.0]' datatable version : '1.0.0a0+build.1601087962.sam' operating system: linux``

samukweku avatar Oct 01 '20 21:10 samukweku

The logic here is that any unary function/operator retains the name of the column. Thus, sum(f.X) or -f.X will produce a column named "X". On the other hand, binary functions/operators do not create any new columns. For example atan2(f.X, f.Y) or -1 * f.X will produce an unnamed column.

st-pasha avatar Oct 02 '20 21:10 st-pasha

Could you explain a bit more @st-pasha ? If I multiply a column by a number, I changed the contents of that column; it doesn't mean I should lose the name. What's the reasoning behind unary vs binary?

samukweku avatar Oct 02 '20 22:10 samukweku

Unary function operates on a single column, so it carries through the name of that column. For example cos(f.A) produces column "A" because the argument of function cos() is a column named "A".

A binary function, on the other hand, takes 2 columns as arguments. For example, f.X * f.Y. Since both of those columns can potentially have names, it is unclear what the name of the result should be. It can't be "X" or "Y" because that would be unfair to the other column. It can't be "X * Y", because applying this rule universally quickly produces bad results, like (f.X + f.Y)*f.Z -> "X+Y*Z". And that's not even taking into account columns with long complicated names.

Thus, the only choice is for f.X * f.Y to be unnamed. We could make special check that if one of the columns in the result is unnamed, then the outcome must bear the name of the other column, but it would mean that f.X * f.Y * f.Z is named "Z" which is very dubious.

I guess we could make a rule that if one of the arguments to a binary function is a scalar then the result is the name of the other column. This would mean that 2 * (f.X - 1) is still called "X".

st-pasha avatar Oct 02 '20 22:10 st-pasha

@st-pasha much clearer now.

I guess we could make a rule that if one of the arguments to a binary function is a scalar then the result is the name of the other column. This would mean that 2 * (f.X - 1) is still called "X".

I think this is a good idea to be implemented.

Also, I feel this column name changes for operations should be documented somewhere(not sure on the exact location), so users are aware. Although, on second thought, it might just be me, and not really a issue for the library's user base.

samukweku avatar Oct 03 '20 00:10 samukweku

Yeah, it should be documented. But where?

st-pasha avatar Oct 03 '20 00:10 st-pasha

I think it should be included in transformation documentation, which is no. 5 on #2604 . Open to suggestions.

samukweku avatar Oct 03 '20 02:10 samukweku

I guess this issue must be relabeled to documentation or something. After following the discussion, I understand that it's a feature not a bug :)

pradkrish avatar Nov 04 '20 21:11 pradkrish

Yes @pradkrish ; still thinking of which part of the documentation to mention this.

samukweku avatar Nov 05 '20 00:11 samukweku

closing this; the alias function can help with renaming

samukweku avatar Nov 23 '22 10:11 samukweku