bench icon indicating copy to clipboard operation
bench copied to clipboard

Inaccurate memory benchmarks

Open Anirban166 opened this issue 3 years ago • 5 comments

Am using bench::bench_memory (extracting mem_alloc or total memory allocations) for computing the memory usage for an R algorithm passed as an expression to my function asymptoticMemoryUsage, which is then passed onto a complexity classifying function from my package which predicts the memory complexity class the algorithm belongs to as per runs against some data sizes.

I tested on some sorting algorithms and I found the results to be odd, with the initial complexity being shown different from the complexities for subsequent runs. Example: image

One probable reason for this occurrence that I figured is for the higher memory allocation value produced initially for the first run: (or for the first data size of an algorithm) image Any way to resolve that?

And am sure its not the classification process because I used the exact same process/method for calculating time complexity through benchmarks via microbenchmark::microbenchmark() for time cases, and it shows accurate (correct complexity) and consistent results. (does not change with subsequent runs)

Also, one point I observed from that comparison is that microbenchmark tests for execution time which gives different values at each run, however bench_bench_memory calculates the 'allocations' which are the same or fixed everytime. Is there any function or feasible way in bench via which we can measure the actual memory usage for an expression/function, instead of just the raw allocations?

Or are there any alternative ways to compute memory benchmarks?

Anirban166 avatar Jul 09 '20 08:07 Anirban166

It is not clear from the above what is 'inaccurate'. Can you provide a minimal reproducible example that illustrates the issue?

jimhester avatar Jul 09 '20 12:07 jimhester

It is not clear from the above what is 'inaccurate'. Can you provide a minimal reproducible example that illustrates the issue?

Sure, here it is:

library(bench)
library(boot)

asymptoticMemoryUsage <- function(e, data.sizes, max.bytes)
{
  ifelse(!all(!is.infinite(data.sizes) & !is.na(data.sizes) & !is.nan(data.sizes)), stop("data.sizes must not contain any NA/NaN/Infinite value."), return)
  lang.obj <- substitute(e)
  fun.obj  <- function(data.sizes)
  {
    eval(lang.obj)
  }
  memory.size.limit <- if(missing(max.bytes)) 10^6 else max.bytes
  memory.metrics.list <- list()
  break.bool <- TRUE
  memory.metrics.list <- lapply(seq(along = data.sizes), function(i)
  {
    if(break.bool)
    {
      benchmarked.memory.size <- bench_memory(fun.obj(data.sizes[i]))$mem_alloc
      if(benchmarked.memory.size > memory.size.limit) break.bool <<- FALSE
      data.size <- data.sizes[i]
      return(data.frame(c(benchmarked.memory.size), c(data.size)))
    }
  })
  resultant.df <- do.call(rbind, memory.metrics.list)
  colnames(resultant.df) <- c("Memory usage", "Data sizes")
  return(resultant.df)
}

asymptoticMemoryComplexityClass = function(model.df)
{
  if(class(model.df) == "data.frame" & "Memory usage" %in% colnames(model.df) & "Data sizes" %in% colnames(model.df))
  {
    constant   <- glm(`Memory usage`~1,                               data = model.df); model.df['constant'] = fitted(constant)
    linear     <- glm(`Memory usage`~`Data sizes`,                    data = model.df); model.df['linear'] = fitted(linear)
    log        <- glm(`Memory usage`~log(`Data sizes`),               data = model.df); model.df['log'] = fitted(log)
    loglinear  <- glm(`Memory usage`~`Data sizes`*log(`Data sizes`),  data = model.df); model.df['loglinear'] = fitted(loglinear)
    quadratic  <- glm(`Memory usage`~I(`Data sizes`^2),               data = model.df); model.df['quadratic'] = fitted(quadratic)

    model.list <- list()
    for(complexity.class in c('constant', 'log', 'linear', 'loglinear', 'quadratic'))
    {
      model.list[[complexity.class]] = eval(as.name(complexity.class))
    }
    cross.validated.errors <- lapply(model.list, function(x) cv.glm(model.df, x)$delta[2])
    best.model <- names(which.min(cross.validated.errors))
    print(best.model)
  }
  else stop("Input parameter must be a data frame with columns 'Memory usage' and 'Data sizes'")
}

# test 1: first shows constant, next shows linear
bubble.sort <- function (x)
{ n <- length(x)
for(i in 1:(n-1))
{
  for(j in 1:(n-i))
  {
    if(x[j+1] < x[j])
    {   tmp <- x [j]
    x[j] <- x[j+1]
    x[j+1] <- tmp
    }
  }
}
return(x)
}

asymptoticMemoryComplexityClass(asymptoticMemoryUsage(bubble.sort(sample(1:100, data.sizes, replace = TRUE)), data.sizes = 10^seq(1, 3, by = 0.5)))
#> [1] "constant"
asymptoticMemoryComplexityClass(asymptoticMemoryUsage(bubble.sort(sample(1:100, data.sizes, replace = TRUE)), data.sizes = 10^seq(1, 3, by = 0.5)))
#> [1] "linear"

# test 2: same case as test 1
insertion.sort <- function(A)
{
  for (j in 2:length(A))
  {
    key = A[j]
    i = j - 1
    while (i > 0 && A[i] > key)
    {
      A[(i + 1)] = A[i]
      i = i - 1
    }
    A[(i + 1)] = key
  }
  return(A)
}

asymptoticMemoryComplexityClass(asymptoticMemoryUsage(insertion.sort(sample(1:100, data.sizes, replace = TRUE)), data.sizes = 10^seq(1, 3, by = 0.5)))
#> [1] "constant"
asymptoticMemoryComplexityClass(asymptoticMemoryUsage(insertion.sort(sample(1:100, data.sizes, replace = TRUE)), data.sizes = 10^seq(1, 3, by = 0.5)))
#> [1] "linear"

# test 3: shows inaccurate complexity (should be constant but shows quadratic)
heap.build <- function(vec)
{
  l = length(vec)
  heap = vec
  for (k in l:1)
  {
    heap = modify.heap(heap, k)
  }
  return(heap)
}
modify.heap<-function(heap, rooti)
{
  l = length(heap)
  flag = 0
  while (rooti*2 <= l && flag==1)
  {
    left.i = rooti*2
    right.i = rooti*2+2
    flag = 1
    child = c(heap[left.i], heap[right.i])
    child = child[!is.na(child)]
    min.ind = which.min(child)
    if (heap[rooti] > child[min.ind])
    {
      flag = 1
      heap.ind = c(left.i, right.i)[min.ind]
      temp = heap[heap.ind]
      heap[heap.ind] = heap[rooti]
      heap[rooti] = temp
      rooti = heap.ind
    }
  }
  return(heap)
}
heap.sort<-function(heap)
{
  sorted.heap = NULL
  l = length(heap)
  while(l > 0)
  {
    sorted.heap = c(sorted.heap, heap[1])
    l = length(heap)
    heap[1] = heap[l]
    heap = heap[1:(l-1)]
    heap = modify.heap(heap, rooti = 1)
    l = l-1
  }
  return(sorted.heap)
}

asymptoticMemoryComplexityClass(asymptoticMemoryUsage(heap.sort(heap.build(sample(1:100, data.sizes, replace = TRUE))), data.sizes = 10^seq(1, 3, by = 0.5)))
#> [1] "quadratic"

Created on 2020-07-09 by the reprex package (v0.3.0)

Out of those 3 tests, the first two describe my first point, (a higher initial memory allocated value from asymptoticMemoryUsage function using bench::bench_memory, for which it shows constant complexity, and then on subsequent re-runs (did 1, can do more - same result) it shows a different complexity (linear), which is not expected. the third test is an example of inaccurate complexity, which I think is probably the result of bench_memory accounting for memory allocations (might be wrong)

Also, are there any alternatives within bench to compute memory use (apart from bench::bench_memory)? or anything else you are familiar with? (I had one case with use of memory.size() calls, but it is restricted to Windows only, and poses a drawback for cross-platform functioning)

Anirban166 avatar Jul 09 '20 15:07 Anirban166

Thank you for the reproducible example, however it is certainly not minimal.

bench::bench_memory() is a very minimal wrapper around utils::Rprofmem() I doubt the issue lies there.

I think likely lazy evaluation is causing additional allocations due to setup code in your first run.

Unless you can provide a simpler example that illustrates a problem, ideally 1 or 2 setup lines and then a call to bench::bench_memory() I will not be able to assist you.

jimhester avatar Jul 09 '20 20:07 jimhester

bench::bench_memory() is a very minimal wrapper around utils::Rprofmem() I doubt the issue lies there.

Ah okay, that makes sense since Rprofmem retrieves the memory allocations..however, is there a function in R which ideally measures the memory usage, and not the allocations? (something similar to being bound by two memory.size() calls?)

I think likely lazy evaluation is causing additional allocations due to setup code in your first run.

Is there any way to avoid the lazy evaluation for the initial run?

Unless you can provide a simpler example that illustrates a problem, ideally 1 or 2 setup lines and then a call to bench::bench_memory() I will not be able to assist you.

Sorry for not producing a minimal one before, here is one which consist of very few setup lines as preferable: First please install the package using either:

remotes::install_github("Anirban166/testComplexity")
# or
devtools::install_github("Anirban166/testComplexity")

Then try this reproducible example:

# library(testComplexity)

df <- testComplexity::asymptoticMemoryUsage(sort(sample(1:100, data.sizes, replace = TRUE), method = "shell" , index.return = TRUE), data.sizes = 10^seq(1, 3, by = 0.5))
df
#>   Memory usage Data sizes
#> 1        19456   10.00000
#> 2         2552   31.62278
#> 3         4792  100.00000
#> 4         9112  316.22777
#> 5        22792 1000.00000
asymptoticMemoryComplexityClass(df)
#> [1] "constant"

df <- testComplexity::asymptoticMemoryUsage(sort(sample(1:100, data.sizes, replace = TRUE), method = "shell" , index.return = TRUE), data.sizes = 10^seq(1, 3, by = 0.5))
df
#>   Memory usage Data sizes
#> 1         2552   10.00000
#> 2         2552   31.62278
#> 3         4792  100.00000
#> 4         9112  316.22777
#> 5        22792 1000.00000
asymptoticMemoryComplexityClass(df)
#> [1] "linear"

Created on 2020-07-10 by the reprex package (v0.3.0)

Anirban166 avatar Jul 10 '20 04:07 Anirban166

bench::bench_memory() is a very minimal wrapper around utils::Rprofmem() I doubt the issue lies there.

Ah okay, that makes sense since Rprofmem retrieves the memory allocations..however, is there a function in R which ideally measures the memory usage, and not the allocations? (something similar to being bound by two memory.size() calls?)

I think likely lazy evaluation is causing additional allocations due to setup code in your first run.

Is there any way to avoid the lazy evaluation for the initial run?

Unless you can provide a simpler example that illustrates a problem, ideally 1 or 2 setup lines and then a call to bench::bench_memory() I will not be able to assist you.

Sorry for not producing a minimal one before, here is one which consist of very few setup lines as preferable: First please install the package using either:

remotes::install_github("Anirban166/testComplexity")
# or
devtools::install_github("Anirban166/testComplexity")

Then try this reproducible example:

# library(testComplexity)

df <- testComplexity::asymptoticMemoryUsage(sort(sample(1:100, data.sizes, replace = TRUE), method = "shell" , index.return = TRUE), data.sizes = 10^seq(1, 3, by = 0.5))
df
#>   Memory usage Data sizes
#> 1        19456   10.00000
#> 2         2552   31.62278
#> 3         4792  100.00000
#> 4         9112  316.22777
#> 5        22792 1000.00000
asymptoticMemoryComplexityClass(df)
#> [1] "constant"

df <- testComplexity::asymptoticMemoryUsage(sort(sample(1:100, data.sizes, replace = TRUE), method = "shell" , index.return = TRUE), data.sizes = 10^seq(1, 3, by = 0.5))
df
#>   Memory usage Data sizes
#> 1         2552   10.00000
#> 2         2552   31.62278
#> 3         4792  100.00000
#> 4         9112  316.22777
#> 5        22792 1000.00000
asymptoticMemoryComplexityClass(df)
#> [1] "linear"

Created on 2020-07-10 by the reprex package (v0.3.0)

I also met a similar problem, Did you have solved this problem? Could you tell me how to obtain accurate memory usage in R?

cquzys avatar Feb 12 '22 08:02 cquzys

Thanks for filing this bug report! Unfortunately because it's so hard to reproduce or we think it's unlikely to affect many people, we don't have the development resources to fix at this time. It's our policy to close such issues to help stay focussed on the biggest problems, but the issue is still indexed by google, so if other people hit it, they'll be able to find it, and we can consider reopen it if it turns out to be a common problem. Thanks for reporting and I'm sorry we couldn't help more 😞.

DavisVaughan avatar May 03 '23 14:05 DavisVaughan