funspotr
funspotr copied to clipboard
Find functions passed as arguments
This is the use case I mentioned at the RStudio conference. I'd like to find functions called as arguments in apply functions or in something like switch()
file_lines <- "
sapply(mtcars, min)
agg <- function(x, fun) {
fn <-
switch(
fun,
avg = mean,
total = sum
)
fn(x, na.rm = TRUE)
}
agg(1:3, 'total')
"
file_output <- tempfile(fileext = ".R")
writeLines(file_lines, file_output)
# A tibble: 4 x 2
funs pkgs
<chr> <chr>
1 sapply base
2 switch base
3 fn (unknown)
4 agg (unknown)
I am expecting to see min
, mean
and sum
in the mix
Ahh, yes, I understand now better. Correct, currently it will not identify any functions passed in through apply()
, map()
, or anything that's passed-in as an argument.
I'm not sure the way to have it capture this, but could look down a few potential paths...
- Review the output after parsing and see if there is something there that would be easier to filter to these and have an option to keep them...
- Alternatively have an argument where you can pass in functions or sets of functions and keep the arguments inputted, e.g.
c("map*()", "*apply()") and then have some custom parsers to extract the
.for
FUN` values inputted to these
Feel free to open a PR or give a suggestion if you have an idea on how to handle. *
I did some exploring and the functions get flagged as "symbol" when called this way. I tried a few approaches but nothing was robust and I wanted to jot down what I learned.
Altogether this was my best attempt
library(tidyverse)
file_lines <- "
sapply(mtcars, min)
min(1:10) -> a
b = base::min(1:10)
d <- plot <- print <- NULL
agg <- function(x, fun) {
fn <-
switch(
fun,
avg = mean,
total = sum
)
fn(x, na.rm = TRUE)
}
agg(1:3, 'total')
"
exprs <- parse(text = file_lines)
parsed_df <- # standardize call, must be a better way to do this
exprs |>
as.character() |>
parse(text = _) |>
utils::getParseData(includeText = TRUE) |>
as_tibble() |>
print()
# github added all the 4-space tabs, I'm a 2-space guy 🚀
parsed_df |>
mutate(expr = cumsum(parent == 0)) |>
# not sure this fully drops formals
mutate(
is_formal = token %in% c("SYMBOL_FORMALS"),
is_new = lead(token, 2) %in% c("EQ_ASSIGN", "LEFT_ASSIGN")
) |>
filter(token %in% c("SYMBOL_FUNCTION_CALL", "SYMBOL")) |>
# not sure this is robust to drop new items used later
group_by(text) |>
filter(cumsum(is_new) == 0) |>
ungroup() |>
# shorten
distinct(text) |>
# figure out what things are
mutate(
from = map_chr(
text,
~find(.x) |> pluck(1) |> toString()
),
is = map_chr(
text,
~get0(.x) |> class() |> toString()
)
) |>
filter(is == "function" & from != ".GlobalEnv")
# text from is
# <chr> <chr> <chr>
# 1 sapply package:base function
# 2 min package:base function
# 3 switch package:base function
# 4 mean package:base function
# 5 sum package:base function
All the caveats & info I found
We could find all symbol
objects in addition to symbol_function_call
objects then inspect them. This casts a very wide net and I think requires the libraries to be attached to know where everything is coming from.
https://github.com/brshallo/funspotr/blob/e529b00d4fb6a243e236b86b4283394e23f4feb8/R/spot-funs.R#L11-L17
purrr::map_chr(
c("min", "mtcars"),
~get0(.x) |> class()
)
# [1] "function" "data.frame"
Another problem with using the method above is formals being in the body of a function coming through as symbols. They start as symbol_formals
in getParseData()
but then become symbol
when called later. globals::findGlobals()
is good for this use case but it doesn't catch that agg
was made in step 2
exprs <- parse(text = file_lines)
map(exprs, globals::findGlobals)
# [[1]]
# [1] "sapply" "mtcars" "min"
#
# [[2]]
# [1] "<-" "{" "switch" "mean" "sum"
#
# [[3]]
# [1] "agg" ":"
Another gotcha is an assignment like 1:3 -> min
is hard to follow in getParseData()
but when treated as an expression, it puts it in the right order
parse(text = "1:3 -> min") |> as.character()
# [1] "min <- 1:3"
For this you could do something weird like below to standardize the call then use getParseData(includeText = TRUE)
This argument will mark the expression call as parent = 0 when they occur in the global environment. You can then use cumsum()
to find each new time that occurs.
parse(text = "1 -> min; a = 23") |>
as.character() |>
parse(text = _) |>
getParseData(includeText = TRUE) |>
mutate(expr = cumsum(parent == 0))
I liked the FUN
or .f
idea but there could be cases where more than one argument is passed (like passing fct_reorder
that then uses sum
) and there are other ways it gets called .funs
.fun
, etc.
There is a lot of info here re: Abstract Syntax Trees (AST).
Some of these use cases are pretty niche maybe there is some low-hanging fruit in the mix? I'll try to take a closer look later.
This is great, thanks for putting together. A few questions / comments:
Do you need the whole exprs |> as.character |> parse ...
or would it just work using utils::getParseData()
directly. I.e.
file_lines <- "
sapply(mtcars, min)
min(1:10) -> a
b = base::min(1:10)
d <- plot <- print <- NULL
agg <- function(x, fun) {
fn <-
switch(
fun,
avg = mean,
total = sum
)
fn(x, na.rm = TRUE)
}
agg(1:3, 'total')
"
file_output <- tempfile(fileext = ".R")
writeLines(file_lines, file_output)
parsed_df <- utils::getParseData(parse(file_output, keep.source = TRUE)
Currently in funspotr I just do the latter and don't see why you'd need the extra set-up in this case even for your later filtering steps...?
I like your use of get0(.x) |> class()
though don't see why you'd need toString()
, does class()
not always return a string?
One thing this makes me think more about is how I want to handle functions that aren't installed locally. Currently if you have madeupfun(x)
in your script "madeupfun" will be returned in the funspotr output as a function and then the package it comes from will just show-up as "(unknown)" -- in the case of apply(x, madeupfun)
though "madeupfun" would almost certainly have to be dropped (as no easy way to differentiate it from x
). This asynchrony is probably OK (a separate issue that should open, but am wondering if should set a warning in the case of "(unknown)" functions or an option to drop these from output).
(Also should consider how much computational overhead get0()
adds... though probably not a ton compared to issues with my current method for loading/attaching packages for each file #14 .)
Do you need the whole exprs |> as.character |> parse ... or would it just work using
Even though it is right-assigned min(1:10) -> a
, R interprets it as a <- min(1:10)
and using as.character() |> parse()
will put it in the right order when I then pass it to getParseData()
. I'm not really sure how to interact with the id
& parent
columns. Maybe there is a simpler way to do this.
I like your use of get0(.x) |> class() though don't see why you'd need toString()
, does class()
not always return a string?
Ah, you're right, I don't need it for class()
. I was trying to deal with null values in find()
though and it looks like ~find(.x)[1]
works just as well.
@rjake did not get to this before pushing to CRAN.
Given the ambiguity here, one approach I was considering was making it so you could have an argument that allowed you to specify an approach. So say you want to do something like your approach above, you could pass this in as a function into a parser
argument (or something like that). funspotr's value-add in this situation would be setting-up the search space but then it gives the user flexibility to identify the symbols to check against the search space. Perhaps funspotr would have some in-built approaches available but it also keeps me from actually having to set this up (or worrying about the ambiguities or partial solutions that may be "good enough" for someone's use-case).
Actually setting this up though would require me to untangle a few steps in the current approach (that I may or may not get to).