pecan icon indicating copy to clipboard operation
pecan copied to clipboard

Fix tidyverse "no visible binding for global variable" notes

Open dlebauer opened this issue 4 years ago • 20 comments

Adapted from discussion with @infotroph in https://github.com/PecanProject/pecan/pull/2756

  • an easy but tedious and inelegant fix for a problem that's mostly just annoying
  • We are currently ignoring 571 no visible binding for global variable NOTEs across all of PEcAn, and my wild guess is that half to two-thirds of these are Tidyverse calls.

Shortcu:t For anyone who runs into this error and doesn't want to implement this fix: just add the appropriate note causing the build to fail to the package's <pkg>/test/Rcheck_reference.log, like this: https://github.com/PecanProject/pecan/pull/2756/commits/4a303b7fe3b32914c1f329be5efffc5f55030339

If you are in a good place to tackle this: we support you, get comfy! Bonus points if you fix all of the issues in the package you are working on. A special prize if you fix all (currently 563) of these Notes.

TODO:

  • [ ] find list of violations in <pkg>/test/Rcheck_reference.log files
    • find . -name Rcheck_reference.log | xargs grep "no visible binding for global variable"
    • to count occurrences add | wc -l to the end of the line above
  • [ ] prepend variables with .data$ as done here: https://github.com/PecanProject/pecan/pull/2756/files#diff-296112e719b8455a8be63e9cac941111d4ae1700081aa05b3a9adfa8d5b9738aR210
  • [ ] add rlang to Imports for relevant package to description
  • [ ] add ##' @importFrom rlang .data to <pkg>R/PEcAn.<pkgname>-package.R. See PEcAn.DB for an example.
    • can also be done per function, but not sure when this would be preferred
  • [ ] delete (relevant) lines starting <function name>: no visible binding for global variable <variable> in <pkg>/test/Rcheck_reference.log
  • [ ] update documentation

Alternative solutions:

  • change build checks to allow these notes
  • or add notes as they appear to <pkg>/test/Rcheck_reference.log so that build ignores it
  • wait until a more elegant solution evolves

See also

  • https://community.rstudio.com/t/how-to-solve-no-visible-binding-for-global-variable-note/28887
  • https://www.r-bloggers.com/2019/08/no-visible-binding-for-global-variable/
  • https://www.google.com/search?q=no+visible+binding+for+global+variable
  • https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html

Originally posted by @infotroph in https://github.com/PecanProject/pecan/pull/2756#discussion_r544964692

dlebauer avatar Dec 17 '20 22:12 dlebauer

Two quick clarifications:

  • each package has its own tests/Rcheck_reference.log, so only delete lines from base/utils/test/Rcheck_reference.log when they were fixed inside base/utils.
  • Yes, you can import once for the whole package by doing it in the package doc block. See PEcAn.DB for an example.

infotroph avatar Dec 18 '20 10:12 infotroph

@infotroph updated the OP w/ your suggestions in https://github.com/PecanProject/pecan/issues/2758#issuecomment-748018564

dlebauer avatar Dec 18 '20 21:12 dlebauer

@dlebauer hello , is this issue still open ? if yes I would like to work on this. thankyou

moki1202 avatar Feb 10 '21 15:02 moki1202

@moki1202 Yes, this issue is still open. We have hundreds of these notes that need to be addressed (e.g., for base/utils, base/db, modules/data.atmosphere).

For each package, follow @dlebauer 's detailed instructions above.

ashiklom avatar Feb 10 '21 16:02 ashiklom

@ashiklom @infotroph We need to prepend only variables with .data$ that cause the run check error "no visible binding for global variable " . please correct me if I'm wrong. Some help on this issue would be appreciated.

moki1202 avatar Feb 17 '21 21:02 moki1202

@moki1202 More or less. TLDR: You'll want to do this for any variables referring to column names in dplyr (and similar) functions (e.g., select, mutate, filter, summarize, etc.).

The root of the problem is that the dplyr package provides the shortcut to refer to columns inside the target data.frame in a more convenient way:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

mydata <- tribble(
  ~food, ~category, ~price,
  "apple", "fruit", 1.25,
  "tomato", "vegetable", 1.40,
  "cucumber", "vegetable", 1.70,
  "orange", "fruit", 0.85,
  "banana", "fruit", 0.30
)

mydata %>%
  select(food, price)
#> # A tibble: 5 x 2
#>   food     price
#>   <chr>    <dbl>
#> 1 apple     1.25
#> 2 tomato    1.4 
#> 3 cucumber  1.7 
#> 4 orange    0.85
#> 5 banana    0.3

mydata %>%
  filter(category == "fruit")
#> # A tibble: 3 x 3
#>   food   category price
#>   <chr>  <chr>    <dbl>
#> 1 apple  fruit     1.25
#> 2 orange fruit     0.85
#> 3 banana fruit     0.3

mydata %>%
  mutate(price_cents = price * 100)
#> # A tibble: 5 x 4
#>   food     category  price price_cents
#>   <chr>    <chr>     <dbl>       <dbl>
#> 1 apple    fruit      1.25         125
#> 2 tomato   vegetable  1.4          140
#> 3 cucumber vegetable  1.7          170
#> 4 orange   fruit      0.85          85
#> 5 banana   fruit      0.3           30

However, this can create some ambiguity. For instance, in the second column below, how can you be sure what price refers to?

price <- 1.00

cheap <- mydata %>%
  filter(price > 0.9)

More insidiously, the following expression will silently succeed if price is defined in the current environment, but return every row in the data frame (because 1.00 > 0.9 is always true):

mydata %>%
  select(-price) %>%
  filter(price > 0.9)

# # A tibble: 5 x 2
#   food     category 
#  <chr>    <chr>    
# 1 apple    fruit    
# 2 tomato   vegetable
# 3 cucumber vegetable
# 4 orange   fruit    
# 5 banana   fruit    

...but if price is not defined in the global environment, it will throw an error.

This ambiguity is exactly the source of the warning from the R CMD check -- price, food, and category all look like variables that don't have definitions in the current environment. The way to make it crystal clear that these variables are coming from inside the data is to prefix these variables with .data$, as in:

mydata %>%
  filter(.data$price > 0.9)
#> # A tibble: 3 x 3
#>   food     category  price
#>   <chr>    <chr>     <dbl>
#> 1 apple    fruit      1.25
#> 2 tomato   vegetable  1.4 
#> 3 cucumber vegetable  1.7

mydata %>%
  select(.data$food, .data$price)
#> # A tibble: 5 x 2
#>   food     price
#>   <chr>    <dbl>
#> 1 apple     1.25
#> 2 tomato    1.4 
#> 3 cucumber  1.7 
#> 4 orange    0.85
#> 5 banana    0.3

mydata %>%
  mutate(price_cents = .data$price * 100)
#> # A tibble: 5 x 4
#>   food     category  price price_cents
#>   <chr>    <chr>     <dbl>       <dbl>
#> 1 apple    fruit      1.25         125
#> 2 tomato   vegetable  1.4          140
#> 3 cucumber vegetable  1.7          170
#> 4 orange   fruit      0.85          85
#> 5 banana   fruit      0.3           30

Created on 2021-02-17 by the reprex package (v1.0.0)

ashiklom avatar Feb 17 '21 22:02 ashiklom

@ashiklom thank you for the help. I tried to do my best to update the files here. some critical review would be highly appreciated for further work in base/db, modules/data.atmosphere.

moki1202 avatar Feb 19 '21 00:02 moki1202

@dlebauer @ashiklom Hi, I am new to open source . Is this issue still open ? If would like to work on this if possible. Thanks

Sarthakaga15 avatar Jan 07 '22 14:01 Sarthakaga15

Thanks for your interest!

Yes, this issue is still open, though we've made some progress on it -- #2773, #2774.

In each PEcAn module, there is a file tests/Rcheck_reference.log; for example, here is the one for the benchmark module: https://github.com/PecanProject/pecan/blob/ee38544e1c995f47073019bd31b3179e7e64991a/modules/benchmark/tests/Rcheck_reference.log#L59-L60

Inside those files, look for messages like "No visible binding for global variable". Some of those are for tidyverse functions and use the .data$<variable> solution described here. Some others are for data.table, which uses similar functionality but requires a different solution. Some others may be genuine references to global variables, which is really bad but also requires a different and probably more complex solution.

ashiklom avatar Jan 07 '22 15:01 ashiklom

@Sarthakaga15 Adding two details to @ashiklom's answer:

  1. There are a select few packages with no Rcheck_reference.log. These are modules that we've cleaned up well enough to eliminate every message from R CMD check, including any "no visible binding..."'s that were there before we deleted the reference log.
  2. The reference logs are only occasionally updated, so if you see a no visible binding... in a given package's Rcheck_reference.log but can't find any such variable in the function it references, it's possible that someone has fixed that one without updating the reference log. If in doubt, you can run rcmdcheck::rcmdcheck("path/to/package") to see if R CMD check still sees a problem with the current code.

infotroph avatar Jan 07 '22 21:01 infotroph

@Sarthakaga15 Feel free to ping me if you get stuck anywhere. I'm sure I can help you out in some places :)

moki1202 avatar Jan 18 '22 08:01 moki1202

Hey I am newbie in open source and want to work on this issue is it still open ? and some resources would be helpful for me :D

akshat-max avatar Jan 27 '22 16:01 akshat-max

Hello @akshat-max! Glad you want to work on this. For now, you can refer to the conversation above for the description and the suggested solution. For starting, @Sarthakaga15 is working on fixing the no visible binding for global variables checks for PEcAn.workflow so pick up any package from modules and start working. If you get stuck somewhere feel free to ping me 😃.

moki1202 avatar Jan 27 '22 17:01 moki1202

Hi! @dlebauer @moki1202 @ashiklom . Can I work on this issue?

Its-Maniaco avatar Mar 31 '22 13:03 Its-Maniaco

Yes it is! We have multiple PEcAn packages that need some fixing. I would suggest, you have a good look at the conversations above and start working! In case you get stuck somewhere feel free to mention me here.

moki1202 avatar Mar 31 '22 13:03 moki1202

Thank you. Will give it a look and start working ASAP.

Its-Maniaco avatar Mar 31 '22 13:03 Its-Maniaco

@infotroph @dlebauer @moki1202 Hi, I am new to open source and would love to work on this issue if it's still open.

vaishase avatar Aug 05 '22 13:08 vaishase

@vaishase yes it is! there are still a lot of packages that need work. Pick a package and check the rcheckreference.log file in the tests folder. check out #2971 and #2965, these PRs solve similar notes and warning that you'll be picking up. I would suggest you have a good look at the conversations above and start working. Happy coding and feel free to ping me if you get stuck somewhere.

moki1202 avatar Aug 05 '22 13:08 moki1202

@vaishase yes it is! there are still a lot of packages that need work. Pick a package and check the rcheckreference.log file in the tests folder. check out #2971 and #2965, these PRs solve similar notes and warning that you'll be picking up. I would suggest you have a good look at the conversations above and start working. Happy coding and feel free to ping me if you get stuck somewhere.

Sure, thanks! I'll look into it.

vaishase avatar Aug 05 '22 13:08 vaishase

Packages that still have this check warning:

  • [ ] db
  • [ ] qaqc
  • [ ] settings
  • [ ] utils
  • [ ] visualization
  • [ ] workflow
  • [ ] ed
  • [ ] linkages
  • [ ] lpjguess
Code to get this list:
library(stringr)
library(purrr)
library(dplyr)

logs <- list.files(
  pattern = "Rcheck_reference.log",
  recursive = TRUE,
  full.names = TRUE
)

logs |>
  set_names(
    dirname(logs) |> dirname() |> basename()
  ) |>
  map_lgl(~{
    readLines(.x) |>
      str_detect("no visible binding for global variable") |> 
      any()
  }) |> as_tibble(rownames = "package") |> filter(value)

Aariq avatar Dec 12 '22 15:12 Aariq