pecan
pecan copied to clipboard
Fix tidyverse "no visible binding for global variable" notes
Adapted from discussion with @infotroph in https://github.com/PecanProject/pecan/pull/2756
- an easy but tedious and inelegant fix for a problem that's mostly just annoying
- We are currently ignoring 571
no visible binding for global variable
NOTEs across all of PEcAn, and my wild guess is that half to two-thirds of these are Tidyverse calls.
Shortcu:t For anyone who runs into this error and doesn't want to implement this fix: just add the appropriate note causing the build to fail to the package's <pkg>/test/Rcheck_reference.log
, like this: https://github.com/PecanProject/pecan/pull/2756/commits/4a303b7fe3b32914c1f329be5efffc5f55030339
If you are in a good place to tackle this: we support you, get comfy! Bonus points if you fix all of the issues in the package you are working on. A special prize if you fix all (currently 563) of these Notes.
TODO:
- [ ] find list of violations in
<pkg>/test/Rcheck_reference.log
files-
find . -name Rcheck_reference.log | xargs grep "no visible binding for global variable"
- to count occurrences add
| wc -l
to the end of the line above
-
- [ ] prepend variables with
.data$
as done here: https://github.com/PecanProject/pecan/pull/2756/files#diff-296112e719b8455a8be63e9cac941111d4ae1700081aa05b3a9adfa8d5b9738aR210 - [ ] add rlang to Imports for relevant package to description
- [ ] add
##' @importFrom rlang .data
to<pkg>R/PEcAn.<pkgname>-package.R
. See PEcAn.DB for an example.- can also be done per function, but not sure when this would be preferred
- [ ] delete (relevant) lines starting
<function name>: no visible binding for global variable <variable>
in<pkg>/test/Rcheck_reference.log
- [ ] update documentation
Alternative solutions:
- change build checks to allow these notes
- or add notes as they appear to
<pkg>/test/Rcheck_reference.log
so that build ignores it - wait until a more elegant solution evolves
See also
- https://community.rstudio.com/t/how-to-solve-no-visible-binding-for-global-variable-note/28887
- https://www.r-bloggers.com/2019/08/no-visible-binding-for-global-variable/
- https://www.google.com/search?q=no+visible+binding+for+global+variable
- https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html
Originally posted by @infotroph in https://github.com/PecanProject/pecan/pull/2756#discussion_r544964692
Two quick clarifications:
- each package has its own
tests/Rcheck_reference.log
, so only delete lines frombase/utils/test/Rcheck_reference.log
when they were fixed inside base/utils. - Yes, you can import once for the whole package by doing it in the package doc block. See PEcAn.DB for an example.
@infotroph updated the OP w/ your suggestions in https://github.com/PecanProject/pecan/issues/2758#issuecomment-748018564
@dlebauer hello , is this issue still open ? if yes I would like to work on this. thankyou
@moki1202 Yes, this issue is still open. We have hundreds of these notes that need to be addressed (e.g., for base/utils, base/db, modules/data.atmosphere).
For each package, follow @dlebauer 's detailed instructions above.
@ashiklom @infotroph We need to prepend only variables with .data$ that cause the run check error "no visible binding for global variable
@moki1202 More or less. TLDR: You'll want to do this for any variables referring to column names in dplyr
(and similar) functions (e.g., select
, mutate
, filter
, summarize
, etc.).
The root of the problem is that the dplyr
package provides the shortcut to refer to columns inside the target data.frame
in a more convenient way:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
mydata <- tribble(
~food, ~category, ~price,
"apple", "fruit", 1.25,
"tomato", "vegetable", 1.40,
"cucumber", "vegetable", 1.70,
"orange", "fruit", 0.85,
"banana", "fruit", 0.30
)
mydata %>%
select(food, price)
#> # A tibble: 5 x 2
#> food price
#> <chr> <dbl>
#> 1 apple 1.25
#> 2 tomato 1.4
#> 3 cucumber 1.7
#> 4 orange 0.85
#> 5 banana 0.3
mydata %>%
filter(category == "fruit")
#> # A tibble: 3 x 3
#> food category price
#> <chr> <chr> <dbl>
#> 1 apple fruit 1.25
#> 2 orange fruit 0.85
#> 3 banana fruit 0.3
mydata %>%
mutate(price_cents = price * 100)
#> # A tibble: 5 x 4
#> food category price price_cents
#> <chr> <chr> <dbl> <dbl>
#> 1 apple fruit 1.25 125
#> 2 tomato vegetable 1.4 140
#> 3 cucumber vegetable 1.7 170
#> 4 orange fruit 0.85 85
#> 5 banana fruit 0.3 30
However, this can create some ambiguity. For instance, in the second column below, how can you be sure what price
refers to?
price <- 1.00
cheap <- mydata %>%
filter(price > 0.9)
More insidiously, the following expression will silently succeed if price
is defined in the current environment, but return every row in the data frame (because 1.00 > 0.9
is always true):
mydata %>%
select(-price) %>%
filter(price > 0.9)
# # A tibble: 5 x 2
# food category
# <chr> <chr>
# 1 apple fruit
# 2 tomato vegetable
# 3 cucumber vegetable
# 4 orange fruit
# 5 banana fruit
...but if price
is not defined in the global environment, it will throw an error.
This ambiguity is exactly the source of the warning from the R CMD check -- price
, food
, and category
all look like variables that don't have definitions in the current environment. The way to make it crystal clear that these variables are coming from inside the data is to prefix these variables with .data$
, as in:
mydata %>%
filter(.data$price > 0.9)
#> # A tibble: 3 x 3
#> food category price
#> <chr> <chr> <dbl>
#> 1 apple fruit 1.25
#> 2 tomato vegetable 1.4
#> 3 cucumber vegetable 1.7
mydata %>%
select(.data$food, .data$price)
#> # A tibble: 5 x 2
#> food price
#> <chr> <dbl>
#> 1 apple 1.25
#> 2 tomato 1.4
#> 3 cucumber 1.7
#> 4 orange 0.85
#> 5 banana 0.3
mydata %>%
mutate(price_cents = .data$price * 100)
#> # A tibble: 5 x 4
#> food category price price_cents
#> <chr> <chr> <dbl> <dbl>
#> 1 apple fruit 1.25 125
#> 2 tomato vegetable 1.4 140
#> 3 cucumber vegetable 1.7 170
#> 4 orange fruit 0.85 85
#> 5 banana fruit 0.3 30
Created on 2021-02-17 by the reprex package (v1.0.0)
@ashiklom thank you for the help. I tried to do my best to update the files here. some critical review would be highly appreciated for further work in base/db, modules/data.atmosphere.
@dlebauer @ashiklom Hi, I am new to open source . Is this issue still open ? If would like to work on this if possible. Thanks
Thanks for your interest!
Yes, this issue is still open, though we've made some progress on it -- #2773, #2774.
In each PEcAn module, there is a file tests/Rcheck_reference.log
; for example, here is the one for the benchmark module: https://github.com/PecanProject/pecan/blob/ee38544e1c995f47073019bd31b3179e7e64991a/modules/benchmark/tests/Rcheck_reference.log#L59-L60
Inside those files, look for messages like "No visible binding for global variable". Some of those are for tidyverse
functions and use the .data$<variable>
solution described here. Some others are for data.table
, which uses similar functionality but requires a different solution. Some others may be genuine references to global variables, which is really bad but also requires a different and probably more complex solution.
@Sarthakaga15 Adding two details to @ashiklom's answer:
- There are a select few packages with no
Rcheck_reference.log
. These are modules that we've cleaned up well enough to eliminate every message from R CMD check, including any "no visible binding..."'s that were there before we deleted the reference log. - The reference logs are only occasionally updated, so if you see a
no visible binding...
in a given package'sRcheck_reference.log
but can't find any such variable in the function it references, it's possible that someone has fixed that one without updating the reference log. If in doubt, you can runrcmdcheck::rcmdcheck("path/to/package")
to see if R CMD check still sees a problem with the current code.
@Sarthakaga15 Feel free to ping me if you get stuck anywhere. I'm sure I can help you out in some places :)
Hey I am newbie in open source and want to work on this issue is it still open ? and some resources would be helpful for me :D
Hello @akshat-max! Glad you want to work on this. For now, you can refer to the conversation above for the description and the suggested solution. For starting, @Sarthakaga15 is working on fixing the no visible binding for global variables checks for PEcAn.workflow so pick up any package from modules and start working. If you get stuck somewhere feel free to ping me 😃.
Hi! @dlebauer @moki1202 @ashiklom . Can I work on this issue?
Yes it is! We have multiple PEcAn packages that need some fixing. I would suggest, you have a good look at the conversations above and start working! In case you get stuck somewhere feel free to mention me here.
Thank you. Will give it a look and start working ASAP.
@infotroph @dlebauer @moki1202 Hi, I am new to open source and would love to work on this issue if it's still open.
@vaishase yes it is! there are still a lot of packages that need work. Pick a package and check the rcheckreference.log
file in the tests folder. check out #2971 and #2965, these PRs solve similar notes and warning that you'll be picking up. I would suggest you have a good look at the conversations above and start working. Happy coding and feel free to ping me if you get stuck somewhere.
@vaishase yes it is! there are still a lot of packages that need work. Pick a package and check the
rcheckreference.log
file in the tests folder. check out #2971 and #2965, these PRs solve similar notes and warning that you'll be picking up. I would suggest you have a good look at the conversations above and start working. Happy coding and feel free to ping me if you get stuck somewhere.
Sure, thanks! I'll look into it.
Packages that still have this check warning:
- [ ] db
- [ ] qaqc
- [ ] settings
- [ ] utils
- [ ] visualization
- [ ] workflow
- [ ] ed
- [ ] linkages
- [ ] lpjguess
Code to get this list:
library(stringr)
library(purrr)
library(dplyr)
logs <- list.files(
pattern = "Rcheck_reference.log",
recursive = TRUE,
full.names = TRUE
)
logs |>
set_names(
dirname(logs) |> dirname() |> basename()
) |>
map_lgl(~{
readLines(.x) |>
str_detect("no visible binding for global variable") |>
any()
}) |> as_tibble(rownames = "package") |> filter(value)