CodeDepends Improve/polish handling of library symbols

API that I hacked in during thesis work probably isn't the right one, and handling could be more robust/useful.

I believe @jonocarroll had some interest in collaborating on this. This issue is just a stub for now, but if there is interest we'll flesh it out and figure out details

Jun 01 '17 16:06 gmbecker

My interest was originally in doing something that may have some application here: https://twitter.com/carroll_jono/status/809206596953772032 (https://gist.github.com/jonocarroll/55046430b23d88e9628ac6b4edf8bb52). I'm interested in helping out when I have the time (which is a bit of a varying quantity at the moment).

Jun 02 '17 00:06 jonocarroll

Right. That's what I was referring to. Incorporating that functionality (in addition to/ rather than the current local/non-local boolean CodeDepends currently uses) and thinking about things that can be done with that info.

Also, if we make assumptions about the script being run by itself in a clean session (e.g., in batch setting) we can reason about the search path and actual disambiguate. A really nice thing that would come out of this that isn't currently easy to do is determining which library() etc calls belong in a dependency thread. Currently they're left off, I believe, which is not great.

Anyway, if you don't have time, or if this ranges too far from what you were interested in, no problem. Left to my own devices I'll get to it eventually, but not in the super-immediate future (i.e. probably not until after summer). Just figured I'd reach out and see, since I had some vague recollection of interest.

On Thu, Jun 1, 2017 at 5:06 PM, Jonathan Carroll [email protected] wrote:

My interest was originally in doing something that may have some application here: https://twitter.com/carroll_ jono/status/809206596953772032 (https://gist.github.com/jonocarroll/ 55046430b23d88e9628ac6b4edf8bb52). I'm interested in helping out when I have the time (which is a bit of a varying quantity at the moment).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/duncantl/CodeDepends/issues/12#issuecomment-305653378, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3dsUp7A04a3aoXLUr2q3CMfsOvF2Y-ks5r_1HogaJpZM4NtMZ7 .

-- Gabriel Becker, PhD Associate Scientist (Bioinformatics) Genentech Research

Jun 02 '17 01:06 gmbecker

Yeah, I'm interested as it's good crossover with what I wanted it to be able to do (CodeDepends is a much more thorough invocation of my goal). Disambiguating based on the search path at that point of the script is complex, I'm sure; i.e. in the case that dplyr was loaded, select() was called once, then MASS was loaded and select() called again, I'd like to be able to pick out the separate dependencies.

Ideally this would allow someone to go through a script and correctly add package:: to every function call (i.e. for writing a packaged function).

I'll surely look through the codebase in the near future and will see what I can do to contribute.

Jun 02 '17 01:06 jonocarroll

Great. This identification of where in the search path a function is currently located can be done statitically. It can also be done semi-statically, i.e., we can evaluate the library()/require() calls contained in the script while doing static analysis and not evaluate the other expressions. Then we look along the actual search path to find "select", etc. This avoids reimplementing a pseudo version of library/require along with all of its options, including pos, lib.loc and anything that could get added in the future. This wouldn't deal with conditional calls to library(). Or within function calls to library() but that is possible.

Jun 02 '17 02:06 dsidavis

conditional calls to library(). Or within function calls to library()

These shouldn't appear in package functions, but I could see someone calling library() in a scripted function. It's perhaps poor practice to do so, but I'd consider this an edge-case of what I had in mind.

Jun 02 '17 02:06 jonocarroll

Absolutely, entirely agree.

Jun 02 '17 02:06 dsidavis

It can also be done semi-statically, i.e., we can evaluate the library()/require() calls contained in the script while doing static analysis and not evaluate the other expressions. Then we look along the actual search path to find "select", etc. This avoids reimplementing a pseudo version of library/require along with all of its options, including pos, lib.loc and anything that could get added in the future.

Indeed, for an example of this approach see the data function handler I added recently. It does this to determine the output variables from a data() call so that they appear properly in the dependency chain.

EDIT: Actually looking closely, is is the approach I took for the current version of library symbol checking that's there now. That is only currently used when you tell CodeDepends to treat functions as inputs, though, so lots more can be done.

This wouldn't deal with conditional calls to library(). Or within function calls to library() but that is possible.

The much harder (probably impossible to fully handle?) issue is require in this sense, since it will succeed and the script/function will continue to run even when the package is not installed. The existing machinery assumes that a require call actually loads the library, which I think is the only feasible way of handling it. That assumption is going to get stronger when we're actually doing things with that library information, though.

This likely won't affect scripts (much) but when bringing CodeDepends to bear on package code things will get a little more complicated. Though I guess require calls throw a warning in R CMD check these days, so it's probably not the problem that it would have been in the past.

Jun 02 '17 19:06 gmbecker

#15 and #16 are relevant here as well.

Jun 09 '17 17:06 gmbecker

CodeDepends CodeDepends copied to clipboard

Improve/polish handling of library symbols

CodeDepends
CodeDepends copied to clipboard