datawizard icon indicating copy to clipboard operation
datawizard copied to clipboard

Getting rid of aliases for selection functions

Open IndrajeetPatil opened this issue 1 year ago • 8 comments

Screenshot 2022-09-14 at 20 50 05

  • get_columns() is confusing because, unless you read the docs, it is not clear whether the function returns column names, or a list of vectors, or a data frame. data_select() is much better, since we have established that data_*() functions always return a data frame. Keeping the get_columns() around for long is inviting confusion.

  • The same is true for data_find(). All other data_*() functions return data frames, while this one returns an atomic vector. This messes user's mental model. We should remove it in favour of find_columns().

library(datawizard)
data_find(iris, starts_with("Sepal"))
#> [1] "Sepal.Length" "Sepal.Width"

Created on 2022-09-14 with reprex v2.0.2

This is probably something we can do in 0.7 or later.

IndrajeetPatil avatar Sep 14 '22 18:09 IndrajeetPatil

I thought the rational was that all data_*() functions only work with data frames (not what they return). Returning a data frame also works for center() et al, so we focus on the input when choosing the function name. But of course, we could say that data_*() means input and output are data frames, and thus remove data_find() in favor of find_columns(). find_columns() sounds more natural, and that's why we added get_columns(), in the tradition of insight's get_*/find_* pairs.

strengejacke avatar Sep 14 '22 19:09 strengejacke

and thus remove data_find() in favor of find_columns()

Okay, that sounds reasonable to me.

find_columns() sounds more natural

Maybe, but it's also more confusing. find_column_names() or extract_column_names() might have been better. But I can make my peace with the current naming schema.

But I still think get_columns() should be removed in favour of data_select().

The fewer functions we have that the user needs to learn, the lower the cognitive load we are imposing on them.

Users are also likely to find it more familiar to work with.

Screenshot 2022-09-15 at 08 22 50

IndrajeetPatil avatar Sep 15 '22 06:09 IndrajeetPatil

find_/get_columns was indeed related to insight's logic, but I also agree that moving forward we should actually start deprecating and pruning aliases and focus on "the most easystatsy way" of doing things

DominiqueMakowski avatar Sep 15 '22 06:09 DominiqueMakowski

Maybe, but it's also more confusing. find_column_names() or extract_column_names() might have been better. But I can make my peace with the current naming schema.

Yeah, I like find_column_names() or extract_column_names(). So we would deprecate both find_columns() and data_find(), and use one of the mentioned suggestions?

strengejacke avatar Sep 15 '22 06:09 strengejacke

find_/get_columns was indeed related to insight's logic, but I also agree that moving forward we should actually start deprecating and pruning aliases and focus on "the most easystatsy way" of doing things

agree.

strengejacke avatar Sep 15 '22 06:09 strengejacke

Screenshot 2022-09-15 at 08 22 50

btw, data_filter() both works like dplyr::filter() and dplyr::slice().

strengejacke avatar Sep 15 '22 06:09 strengejacke

btw, data_filter() both works like dplyr::filter() and dplyr::slice().

@etiennebacher Should we update the translation vignette to also include this equivalence?

IndrajeetPatil avatar Sep 15 '22 06:09 IndrajeetPatil

@etiennebacher Should we update the translation vignette to also include this equivalence?

Already added by @strengejacke

etiennebacher avatar Sep 18 '22 16:09 etiennebacher