design icon indicating copy to clipboard operation
design copied to clipboard

Initiatialising a variable on package load

Open jennybc opened this issue 3 years ago • 15 comments

Summarizing a recent Slack discussion that seems worth preserving.

The conversation was triggered by https://github.com/r-lib/gargle/issues/184, where someone following a pattern seen in several gargle-using packages was getting push back from CRAN for a new package submission.

Specifically, CRAN objected to the use of <<- in .onLoad(), interpreting that as modifying the global environment. As it turns out, this object was created elsewhere in the package, in the package's namespace, so the <<- was modifying that object.

But this use of <<- is not easily seen to be OK, so it's worth having better patterns for this.

Typical "before" code, that might get flagged by CRAN:

# in R/drive_auth.R
.auth <- NULL

# in R/zzz.R
.onLoad <- function(libname, pkgname) {

  .auth <<- gargle::init_AuthState(
    package     = "googledrive",
    auth_active = TRUE
  )
 
 ...
}

The objective in this case is to make sure that googledrive's internal .auth object is an instance of gargle's R6 class AuthState, according to the current, locally installed version of gargle, not the ambient gargle version when the googledrive binary was built.

Discussion with @jimhester @gaborcsardi @jeroen @DavisVaughan lead to these observations and alternatives:

  • Put the .auth <- NULL assignment as close as possible to the <<-, so it's easier to see what's going on and that the super assignment does not touch global env.

  • Use utils::assignInMyNamespace() to make it clear that the target object lives in my namespace. You still must create the target object elsewhere, as utils::assignInMyNamespace() can only modify, not create.

    # in R/drive_auth.R or R/zzz.R (each has some pros/cons)
    .auth <- NULL
    
    # in R/zzz.R
    .onLoad <- function(libname, pkgname) {
        utils::assignInMyNamespace(
          ".auth",
          gargle::init_AuthState(package = "googledrive", auth_active = TRUE)
        )
    }
    
  • Use assign() and specify the environment as environment(.onLoad), topenv(), or asNamespace("pkg"). Note that the target .auth object does not need to be pre-created in this case.

    .onLoad <- function(libname, pkgname) {
        assign(
          ".auth",
          gargle::init_AuthState(package = "googledrive", auth_active = TRUE),
          environment(.onLoad)
        )
    }
    

jennybc avatar Jul 01 '21 19:07 jennybc

Another way is to hook assignment like this:

on_load(
  var <- value
)

This uses the hook implemented in https://github.com/r-lib/rlang/blob/master/R/aaa.R (could be made a base-only compat file and I'm planning to export from rlang at some point). The big advantage besides readability is that you can put the assignment in context rather than in .onLoad(). Example: https://github.com/r-lib/rlang/blob/dc03e447/R/env-special.R#L389-L392. I also use it for things like lazy S3 registration: https://github.com/r-lib/rlang/blob/dc03e447/R/quo.R#L257-L267

lionel- avatar Jul 03 '21 05:07 lionel-

I think that we might now favour the here? e.g.

# in R/drive_auth.R
the <- new.env(parent = empty_env()

# in R/zzz.R
.onLoad <- function(libname, pkgname) {

  the$auth_state <- gargle::init_AuthState(
    package     = "googledrive",
    auth_active = TRUE
  )
 
 ...
}

hadley avatar Jul 20 '23 19:07 hadley

the is often used in combination with on_load() like in https://github.com/tidyverse/dplyr/blob/c963d4db08ab4b4e7f75e3b99b04ef5f6e903a25/R/slice.R#L583-L586

As @lionel- said, it is useful to have the on_load() call in the context of where it is actually used

DavisVaughan avatar Jul 21 '23 01:07 DavisVaughan

Ah yeah, you can't generally just do the$slice_by_arg <- ".by" unless you define the very early on. We should discuss whether it's better to deifne the in aaa.R or use on_load(), which I think is good for us, but I worry it's quite a bit of infrastructure for others.

hadley avatar Jul 21 '23 12:07 hadley

FWIW, after tracking down a bug that involved debugging rlang::.onLoad(), I am not such a big fan of on_load().

gaborcsardi avatar Jul 21 '23 13:07 gaborcsardi

If use and justification of the + on_load() were spelled out clearly, the infrastructure probably wouldn't be too bad. And/or maybe a usethis::use_the_env() or similar to get the aaa.R bit handled cleanly.

jonthegeek avatar Jul 21 '23 14:07 jonthegeek

FWIW, after tracking down a bug that involved debugging rlang::.onLoad(), I am not such a big fan of on_load().

If I'm not mistaken you're referring to having to grepp for on_load() to see the on-loaded expressions? This downside is the direct flip side of the benefit of having the on-loaded expressions being in context, since you no longer have them in a single place. The on-load hook often end up being messy so I think the benefits of having the expressions in context outweigh the downside though.

How about having an envvar that turns on printing of the entire set of on-load expressions that are about to be run, as well as some information between each invokation. I think this would improve the debugging experience compared to .onLoad() rather than degrade it. WDYT @gaborcsardi?

lionel- avatar Jul 24 '23 06:07 lionel-

We could reuse _R_TRACE_LOADNAMESPACE_ for that debugging mode.

lionel- avatar Jul 24 '23 07:07 lionel-

Grepping is fine, but I wanted to put a browser() in it to see what triggers the package_version() bug.

gaborcsardi avatar Jul 24 '23 07:07 gaborcsardi

I think you can do that, though that might require adding curly braces, e.g. on_load({ browser(); foo() }). Or am I missing something?

lionel- avatar Jul 24 '23 07:07 lionel-

I wanted to step over all the stuff that happens in .onLoad(), so the browser() goes in .onLoad(). But then you step into run_on_load() and then each callback() is

Browse[3]> callback
function ()
{
    do <- NULL
    do.call(delayedAssign, list("do", expr, env))
    do
}

and the actual code is in expr which is evaluated in do, and I can't seem to be able to debug the evaluation of do here. Unless I then go back to the source, put the browser() into that on_load() block after some grepping, reinstall the package, restart R, etc.

Something like this. There is probably a more efficient way of doing this....

gaborcsardi avatar Jul 24 '23 07:07 gaborcsardi

I see, thanks. It would probably take a morning or so to implement something like in coro where we generate browsable code. Using it would look like browsing around all the files that call on_load() as you hit next.

Edit: ah if you insert a browser() in an on_load() it already works just like this. However you need to call base::browser(), there seems to be a bad interaction with the rlang:::browser function that I use for debugging testthat snapshots (it diverts the output back to stdout while the browser is running so you can see the output of your debugging commands).

lionel- avatar Jul 24 '23 07:07 lionel-

Ah, no need, I'll just get better at this... :D

gaborcsardi avatar Jul 24 '23 07:07 gaborcsardi

A non-rlang way to do this is to define helper functions like https://github.com/RConsortium/S7/blob/main/R/zzz.R#L121-L128

For use in the broader community, I think I favour that technique because it's trivial to understand. I'll still mention rlang::on_load() in a callout.

hadley avatar Sep 16 '23 13:09 hadley

The callout might mention that the on_load() approach makes defining bindings at load time easier:

on_load(
  foo <- bar()
)

versus

foo <- NULL

on_load_my_file <- function() {
  foo <<- bar()
}

lionel- avatar Sep 18 '23 09:09 lionel-