covr icon indicating copy to clipboard operation
covr copied to clipboard

Functions Defined Inside Structures Not Tracked

Open brodieG opened this issue 8 years ago • 20 comments

I discovered this when the validity methods defined within setClass for my S4 classes as well as all methods defined for my RC classes were not tracked. I created a package to illustrate the issue (on github):

## R/pckgA.R
fun.list <- list(
  a=function(x) {
    y <- x + 1
    y
  },
  b=function(x) {
    y <- x + 1
    y
  }
)
#' @export

fun <- function() 1 + 1

## tests/test.R

fun.list$a(5)
fun.list$b(6)
fun()

Then, after running covr:::package_coverage:

pckgA Test Coverage: 100.00%
R/pckgA.R: 100.00%

even though I'm not running the functions inside the list. I'm using a list here because it is simpler, but the same thing happens with functions defined in:

setClass(
  ...,
  validity=function(object) ...
)

or in

setRefClass(
   ...,
   methods=list(
    a=function() ...
) )

brodieG avatar Sep 25 '15 01:09 brodieG

Actually, realized you can see this in TestRC in the existing package. From covr working directory, run setwd("tests/testthat/TestRC") then shine(package_coverage()) and you can see in the file tab that the RC methods are not tracked (i.e. neither green nor red).

brodieG avatar Sep 25 '15 02:09 brodieG

covr:::trace_calls(fun.list)
#> $a
#> function (x)
#> {
#>     {
#>         covr:::count("0x404e048:3:5:3:14:5:14:3:3")
#>         y <- x + 1
#>     }
#>     {
#>         covr:::count("0x404e048:4:5:4:5:5:5:4:4")
#>         y
#>     }
#> }
#> 
#> $b
#> function (x)
#> {
#>     {
#>         covr:::count("0x404e048:7:5:7:14:5:14:7:7")
#>         y <- x + 1
#>     }
#>     {
#>         covr:::count("0x404e048:8:5:8:5:5:5:8:8")
#>         y
#>     }
#> }

It will require some care to handle properly though, I will look into it further when I can.

jimhester avatar Sep 25 '15 12:09 jimhester

The issue here is that currently the way covr works is to run trace_calls() on every object defined as a function in the package namespace. So in order to get this to work we would have to search every object in a namespace for nested function calls. I worry this would be slow if there are large data objects in a package. But perhaps it cannot be helped, as you said this is an issue with reference classes.

jimhester avatar Oct 20 '15 12:10 jimhester

I imagine if you only allow one level of nesting this should be manageable as the application of "is.function" or some such should be fast, and would cover 95% of the cases (i.e. RC classes, etc.). List based data structures with very wide top levels are pretty rare (I think). You can even check length and bail out of truly large data structures. I'll try to have a look at this at some point and see if it is realistic if you don't get to it first, though it will not be high on my list of priorities in the near term.

brodieG avatar Oct 20 '15 19:10 brodieG

sorry, was trying to comment and hit wrong button...

brodieG avatar Jul 01 '16 12:07 brodieG

Looks like you resolved the RC issue, thanks.

Re the nested stuff, I started poking around and it seems like this is fairly feasible so long as the env and name elements of the replacement return list aren't used anywhere. From a cursory examination it doesn't seem like they actually are. In particular, the C funs in reassign.c don't appear to act on them.

My problem right now is that I would need to call replacement (or some variation thereof) on sub-elements of lists or on attributes, in which case the name/env pairing stops making sense. If I'm right about name/env values not being used, I can just pass dummy ones and move right along. If they are used, which I'm guessing they are in a way that escapes me currently, then the problem is likely more complicated.

brodieG avatar Aug 23 '16 01:08 brodieG

I think you are correct, name and env are mainly there for debugging purposes, perhaps they should just be removed entirely.

jimhester avatar Aug 31 '16 20:08 jimhester

Alright, so I took a very preliminary stab at this, and it seems to work in a simple test case:

screen shot 2016-09-03 at 3 46 07 pm

Re performance, I don't think this will be a problem since covr recursively traverses through the entire set of source expressions already and that will typically be a deep and expansive tree. For the life of me I cannot imagine a package that would come with a data source that has a degree of recursive complexity that approaches that of its source code. I guess someone might ship with some massive linked list or some such, but that seems far fetched. If this remains a concern we can make the "recursive" analysis an option that package developers can enable if they wish. In fact, I would probably recommend this recursive option to come disabled at first if you do decide you're interested in adding it.

Side note, at first blush it seems like we could just recurse through environments provided that we add a tracking list of visited environments to avoid getting stuck in a snake-eats-tail situation. This might also allow a more generalized handling of R6/RC/etc., although we probably can't just recurse through the return value of ls(all=T, namespace) given the system objects that are hidden there.

Let me know if you're interested in pursuing this further. I'd be happy to do a proper implementation with some guidance from you, although I have to admit that I do not have a full grasp of how all the pieces interact, and obviously I'm messing with a fairly sensitive part of the package.

brodieG avatar Sep 03 '16 20:09 brodieG

@jimhester, bump... Any interest in pursuing this further?

brodieG avatar Sep 26 '16 00:09 brodieG

I just found the same issue within the PKNCA package: https://codecov.io/gh/billdenney/pknca/src/bf902897a1c6d4777a4508efa4a6aa822ce36f55/R/interpolate.conc.R

I have a list of functions (with some extra decoration on them) at the end of the linked file. I'd like to track which of the functions are tested and which are not.

I agree with @brodieG that for most packages the search of the full package space shouldn't take too much longer for most packages. Perhaps the solution could be a feature to turn off full package searching if it is too time consuming?

billdenney avatar Dec 12 '16 17:12 billdenney

I have a list of functions. Even though they all have explicit tests they are being displayed as uncovered on codecove and also if I use package_coverage(). However if I use file_coverage() they show as covered.

elinw avatar Jun 16 '17 07:06 elinw

@elinw if you want this fixed you need to provide links to the source of exactly what functions you think should have coverage, exact lines in the codecov report that are not covered and exact commands you are running locally.

jimhester avatar Jun 16 '17 11:06 jimhester

Here is another example with ggproto objects:

image

Presumably rect_to_poly is called via the ggproto method on line 41.

As always I'm happy to help with a PR if you'd like.

brodieG avatar Jun 02 '18 15:06 brodieG

As an alternative that is generic, how about a marker that makes covr descent deeper? E.g. the list / env could have a covr_this = TRUE attribute.

gaborcsardi avatar May 28 '19 15:05 gaborcsardi

We could maybe do it with a special comment as well, ideally your R code would not have to change.

jimhester avatar May 28 '19 17:05 jimhester

Yeah, that does sound good, but it might be tricky to implement. E.g. in this case where would you put the comment?

api <- local({
  exports <- list()
  ...
  exports$fun1 <- function() { ... }
  ...
  exports$fun2 <- function() { ... }
  ...
  exports
})

gaborcsardi avatar May 28 '19 18:05 gaborcsardi

Is there reason to believe that allowing full traversal by default becomes prohibitive?

brodieG avatar May 28 '19 18:05 brodieG

@brodieG Maybe not, but it might be too late to change this now. But either a global or local opt-in might make sense. Or both. We can also make a global opt-in now, and see how it works, before changing the default.

Btw. with a full traversal you might get loops, although that's probably solvable.

gaborcsardi avatar May 28 '19 18:05 gaborcsardi

One case where full traversal would like be problematic would be a data package using LazyData, pretty sure a full traversal would require all data objects to be loaded into memory. Not a huge deal for CRAN packages, but there are Bioconductor packages where this would cause multi-gb objects in memory.

jimhester avatar May 28 '19 19:05 jimhester

So how about having a global opt-in full traversal for a start? Not entirely trivial, but probably not too bad to implement, e.g. I think debugme already has a full traversal. Then we could just turn it on for the packages that need it...

gaborcsardi avatar May 28 '19 19:05 gaborcsardi