datawizard icon indicating copy to clipboard operation
datawizard copied to clipboard

Programming with `datawizard`'s select helpers

Open rempsyc opened this issue 3 years ago • 3 comments

In the context of my easystats/performance#443 PR, I’ve experienced a difficulty using datawizard’s select helpers (starts_with, ends_with, contains, so I am moving the discussion here.

Essentially, when using the select helpers within functions, it triggers R CMD check warnings e.g., check_outliers.data.frame: no visible global function definition for ‘contains’ because they seem not to be functions per se (I don’t know what they are actually). I thought maybe I could specify it with the package name, but this produces an error:

datawizard::starts_with
#> Error: 'starts_with' is not an exported object from 'namespace:datawizard'

I have tried to find examples of other easystats packages making use of the select helpers so I could mimic their use, but I could not find any. I understand that this is in part because the select helpers are rather new. In my own package, I have a file globalVariables.R to deal with such issues:

@export
utils::globalVariables(c("var1", "var2", "etc."))

But I’m not sure whether this is good practice when it can be avoided. So what is the recommended strategy to deal with this warning when writing a PR for easystats?

Created on 2022-08-13 by the reprex package (v2.0.1)

rempsyc avatar Aug 13 '22 20:08 rempsyc

Yeah, there are likely to be issues when select-helpers are used within other functions, see also 'Details' in ?data_select.

The suggestion is:

One workaround is to use the regex argument, which provides at least a bit more flexibility than exact matching. regex in its basic usage (as seen below) means that select behaves like the contains("") select-helper, but can also make the function more flexible by allowing to define complex regular expression pattern in select.

So basically, select is a string representing a regular expression pattern, and then you set regex = TRUE:

datawizard::find_columns(iris, select = "\\.", regex = TRUE)
#> [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"

strengejacke avatar Aug 13 '22 20:08 strengejacke

Ok, I see, thanks. So may I say that the general recommendation is to not use the select helpers internally for the easyverse, and just rely on the regex argument, always?

Because in the way that I wrote it for check_outliers, I think it works as it should, there's just the warning about global variables not defined being annoying. But are you saying that it isn't "safe" to use them, and that we shouldn't just ignore it and define a global variable manually to get rid of the warning?

rempsyc avatar Aug 13 '22 20:08 rempsyc

I'd say we should avoid defining global variables. And since select-helpers in some situations may not work as expected, I would recommend to either use character vectors when the names are known (select = c("Sepal.Length", Sepal.Width") or to use the regex argument (select = "^Sepal\\.", regex = TRUE) .

strengejacke avatar Aug 13 '22 22:08 strengejacke

Can this be closed?

strengejacke avatar Aug 24 '22 06:08 strengejacke