box Support re-running modified `box::use` declaration

When interactively developing a script, it’s common to go back and modify, then rerun, a box::use top-level declaration.

Currently, this causes attached environments and aliases to accumulate. To illustrate, consider the following declaration:

box::use(
    dplyr[`%>%`, filter],
    tibble
)

Let’s say I execute the above and then write some more code, and realise that I also need to use mutate, and that it would be nice to have a shorter alias for ‘tibble’. So I modify the above declaration as follows and rerun the it:

box::use(
    dplyr[`%>%`, filter, mutate],
    tbl = tibble
)

This causes a second 'mod:dplyr' environment to be attached. It also causes the alias tbl to be created in addition to tibble, which continues to exist.

Ideally, neither should happen. Instead, rerunning a modified use declaration should

Modify existing attached environments
Remove no longer needed attached environments
Modify existing aliases
Remove no longer needed aliases

It’s not obvious how this would be implemented, or if it’s even possible in general: how would the “identity” of a use declaration even be tracked?

A distinct use-case occurs when interactively developing a module: in this case, the right situation is to run box::reload. This isn’t applicable here.

Feb 16 '21 14:02 klmr

This single issue is perhaps the biggest blocker to me adopting {box} more thoroughly. I wind up resorting back to library() with the intention of replacing those statements with more-carefully thought out box::use() calls, but --alas -- it doesn't always happen :-/

As a very brute-force and inelegant work-around for top-level scripts (only), would it be easily doable to keep a 'parallel' list of modules (e.g. privately in {box}'s namespace) that have any elements attached on the global search() path? Then walk through that path backwards, box::unload()-ing along the way, with an eventual re-running of the box::use() calls, effectively 'rebuilding' the module structure.

This can obviously go awry in a lot of ways, one of the largest being that when re-building the module & attachment 'state', one can get broken/invalid behavior from existing functions and objects. Presumably the purpose of such a procedure is specifically to not need a script re-run to line X, i.e. to keep the current state. Having an inconsistent-behaving such state might be worse than just paying the time cost to re-run in many instances. (There are plenty of examples where re-running is quite costly in the development workflow, though, hence the point of this issue, I suppose :-)

I've looked into experimenting with this myself, but I found that box::unload() taking an actual mod object makes it tough for instances like box::use(foo[bar]) since only the attachment occurs and we don't easily have a reference to the module to pass to box::unload(mod). This is the origin of my question/idea about {box} keeping an internal list of which search-path attachments correspond to an underlying {box} module ... then one could 'recapture' the module object with a function like box::search2mod(pos) that would yield the module reference (or NULL if not a module).

(Sorry if that appears to be a bit of rambling, but I hope my question/idea came across accurately and is useful for fostering discussion :-)

Oct 26 '21 19:10 mmuurr

Unfortunately I don’t hink this is fixable, even in principle, except in very limited cases. There’s simply no way to distinguish between the user rerunning an “existing” box::use expression on the one hand, and executing a separate, second box::use expression in the other: in general, if a user runs box::use(foo[bar]) followed by box::use(foo[baz]), they want both of the names (bar and baz) to be attached, rather than the second expression superseding the first.

Figuring out that the user instead modified and reran a single box::use expression in a code file might be possible in RStudio (but even there I’ve got no idea how) but not in general. Then again, since RStudio is by far the most widely used IDE, solving this case first would already be great. I will have to read the RStudio source code to figure out if there are appropriate hooks (I’ve done this before for something else, and it’s … discouraging).

Incidentally, actually identifying the loaded modules and packages is trivial! — ‘box’ already maintains this information anyway (in combination with the information contained in search()).

Lastly, for reference, the (unexported) function unload_mod_recursive.box$ns performs the actual unloading of a module (but doesn’t detach it). It requires the module namespace and module info as arguments, which can be found out from a module environment (such as the one that’s attached) from its attributes.

For instance, to get the relevant data from the module attached at the second position:

mod_env = as.environment(2L)
mod_ns = attr(mod_env, 'namespace')
mod_info = attr(mod_env, 'info')

Since unload_mod_recursive is not exported, S3 lookup won’t work so it can’t be called directly outside of ‘box’; instead, it would need to be invoked as

box:::`unload_mod_recursive.box$ns`(mod_ns, mod_info)

… but unfortunately that’s the easy part, as noted above.

Nov 10 '21 23:11 klmr

in general, if a user runs box::use(foo[bar]) followed by box::use(foo[baz]), they want both of the names (bar and baz) to be attached, rather than the second expression superseding the first.

I agree, and in fact I think unattaching bar would be a pretty non-standard idea that should force the user to jump through hoops (or restart their script). I find it to be pretty rare where someone calls library(foo) then detach()es from the search path ... typically I'd classify any such user as 'advanced' and any such practice as 'risky' :-)

But in your two-call example, what about an optional that allows for switching between: (i) adding baz to a new top-of-search-path environment (the current behavior) or (ii) adding baz to the existing search-path environment containing bar?

Then, the two call example when amended to reflect the typical coding practice that's the namesake of this issue can become something like:

box::use(foo[bar])

... we do some coding, realize we also need baz, we update the {box} command to:

## I'm not actually proposing the name `.use_existing_env`, just using it as an example.
## Perhaps instead using an actual `options()` option.
## Or maybe even `box::reuse(...)` that works like `use` if there's no existing env, else it 're-uses' that env.

box::use(foo[bar,baz], .use_existing_env = TRUE)

... and then rerun that command with that option set where we'd not add a new environment to the search path, but rather inject baz into the existing bar-containing env.

This also would prevent the challenges with unloading/detaching from the existing search path, because order matters and re-inserting a new env into the correct spot in that search path seems like a pain to carefully manage.

I'm just thinking out loud (via the sounds of my typing) for the sake of conversation/brainstorming ... I still think {box} is a great addition to the R ecosystem :-)

Nov 15 '21 20:11 mmuurr

Is there a need to switch between the two semantics? Isn’t (ii) always what the user wants? I know that ‘box’ currently isn’t doing this, but moving to (ii) is generally the plan.

Nov 16 '21 00:11 klmr

I think (ii) is the natural and preferred behavior, yes. I only mentioned (i) in case you wanted to preserve backwards-compatibility, since technically it'd be a change from the current semantics. (But I struggle to come up with a good use case where someone would now depend on (i), so 🤷.)

Nov 16 '21 03:11 mmuurr

Ah, hm. I actually thought of a case. Consider the following.

box::use(a[...])
box::use(b[...])
box::use(a[f])

Assume that both a and b export two functions, f and g.

Whether we do (i) or (ii) changes which f we see. And in fact (ii) does the “wrong” thing here, since b$f continues to be visible even though the user expects a$f. And we can’t simply pull the existing a attachment to the front of the search path either, since that would cause f$g to be visible even though the user expects b$g.

Of course one might argue that the above isn’t great code, but I can totally see this being used in practice.

However, maybe it’s time ‘box’ gets rid of the search path hierarchy entirely, and merge all ‘box’ import environments into one. In fact, there’s already an inconsistency, since it’s not used inside modules.

Here’s what the search() path for the code above currently looks like (arrow means parent.env(.)):

Untitled

The same is true for local imports (i.e. inside a function or other environment). By contrast, inside a module, all imports are attached to the module namespace as a single, merged import environment:

Untitled 2

I think ‘box’ should probably unify the various ways of attaching names, and always use a single import environment. To wit, search() should look as follows (and equivalently for local environments):

Untitled 3

This would be a breaking change but the consequences should be relatively hidden. In particular, the code above continues to work “as expected”, but otherwise the behaviour is that of (ii). Thoughts?

— Of course this is a long shot from what this issue set out to address, but as noted I don’t believe that a general fix for that exists (outside of potentially existing, complex editor-specific hacks) and at least it unifies how ‘box’ handles attachment across different scenarios, and leads to a somewhat cleaner workspace during interactive development.

Nov 16 '21 10:11 klmr