dplython icon indicating copy to clipboard operation
dplython copied to clipboard

Add transmute functionality into select

Open dodger487 opened this issue 8 years ago • 2 comments

transmute and select do very similar things: create a smaller dataframe that is just a few derived columns from the current one.

The different between transmute and select is a small one. transmute creates new columns, basically a mutate and then a select. select only uses existing columns. Why not put all of this functionality inside select?

diamonds >> select(X.carat * 2, X.color, chair=X.table) >> head()
  # Out:
  # X["carat"] * 2  color  chair
  #  8.01              I1     61  
  #  8.01              I1     62  

One argument against this is that dplyr uses - to indicate "drop this". So diamonds %>% select(-carat) drops the carat row. This seems a little strange here, and separate from the SQL syntax, where a user might expect that this gives you the negative version of carat. To keep this functionality, we could make a new drop verb which drops rows.

dodger487 avatar May 27 '16 14:05 dodger487

+1. I agree that using - to drop columns in select is a little weird anyway. I might prefer a special function like except that works within select (much like matches and contains in dplyr), rather than adding a whole new verb.

I want to hear input from @dgrtwo on this as well. We should make sure we wouldn't be preventing any other special select functionality. In particular, pattern-matching column names needs to remain simple. But I think there's an opportunity here for a unified interface for pattern-matching of column names, not just in select, but also in mutate and summarize (instead of dplyr's mutate_each and summarize_each).

In your last sentence, I assume you mean "a new drop verb which drops columns."

danrobinson avatar May 29 '16 18:05 danrobinson

Minor annoying thing here... If I do diamonds >> transmute(X.color) the resulting column will be named X.color. If I do diamonds >> select(X.color) the resulting column will be named color.

I basically want to replace select with transmute. So which behavior makes more sense?

I think if we keep the current select-style, then we should drop all X. from the string rep of Laters, for consistency sake. If we keep the transmute style, well, then it's terrible because as soon as you do a select your column names are all screwed up with X getting added to them all the time.

I'd like to hear community opinions, but probably we should just get rid of X in the string. It's also closer to dplyr results.

dodger487 avatar Aug 03 '16 05:08 dodger487