Wishlist-for-R icon indicating copy to clipboard operation
Wishlist-for-R copied to clipboard

WISH: Rprof() Improvements

Open brodieG opened this issue 6 years ago • 0 comments

Background

Rprof() dumps the R call stack at timed intervals into a text file. The text file can then be processed to reconstruct/estimate how much time was spent under different call stacks.

Possible Improvements

Provide Function disambiguation

For example, in the following:

> my_fun2 <- function(x) {sample(x); x}
> my_fun <- function(x) {runif(x); x}
> Rprof()
> x <- lapply(rep(1e6, 20), my_fun)
> y <- lapply(rep(1e6, 20), my_fun2)
> Rprof(NULL)
> summaryRprof('Rprof.out')
$by.self
             self.time self.pct total.time total.pct
"runif"           0.96    59.26       0.96     59.26
"sample.int"      0.64    39.51       0.64     39.51
"sample"          0.02     1.23       0.66     40.74

$by.total
             total.time total.pct self.time self.pct
"FUN"              1.62    100.00      0.00     0.00
"lapply"           1.62    100.00      0.00     0.00
"runif"            0.96     59.26      0.96    59.26
"sample"           0.66     40.74      0.02     1.23
"sample.int"       0.64     39.51      0.64    39.51

$sample.interval
[1] 0.02

$sampling.time
[1] 1.62

The symbol FUN is referencing both my_fun and my_fun2. An option to provide a uniquely identifying string would be very useful. For example, since now compiling is enabled by default, we could reference those functions by bytecode address:

> my_fun2
function(x) {sample(x); x}
<bytecode: 0x7fcb8a2d4b10>
> my_fun
function(x) {runif(x); x}
<bytecode: 0x7fcb88491520>

For functions that are part of packages, ideally this could be used to look up their original symbol (method of this TBD).

This should greatly improve the usefulness of graph based profile views.

Better Support for Fast Function Looping

One strategy as implemented in treeprof is to loop an expression repeatedly so that by random sampling even very fast expressions can be comprehensively profiled.

It would be very useful to be able to mark the beginning or end of each loop with a some arbitrary text written to the file (e.g. "") that can then be used on parsing to better reconstruct the code evaluation. Taken to an extreme, if this feature could somehow be embedded in all looping constructs (e.g. for, *pply, etc) then linear displays of profiles would become much more useful as repeated code evaluation could be easily aggregated.

The main problem right now is there is no way to guarantee that a particular stack state gets dumped, so there is no way to insert an artificial marker.

brodieG avatar Mar 10 '18 19:03 brodieG