matchingR icon indicating copy to clipboard operation
matchingR copied to clipboard

`roommate.checkPreferences` overhead

Open jessexknight opened this issue 5 months ago • 2 comments

Thanks very much for this package!

I am running roommate(utils=utils) 1000s of times with changing utils but small population size (N < 1000). So, any overhead adds up.

For small N, the overhead is actually substantial, and I traced it to roommate.checkPreferences. I wonder if this check could be skipped using a flag to roommate. In fact, when pref is generated from utils by sortIndexOneSided(as.matrix(utils)), I wonder if checkPreferences can always be skipped? Since it seems sortIndexOneSided returns prefs as needed in cpp.

I thought the bottleneck in checkPreferences was the loop to create comp, but it seems not (I guess it is the sorting). Still, I think comp can be generated more easily by: comp <- array(seq_len(NROW(pref)), dim = dim(pref)) - upper.tri(pref)

MWE & benchmarking

library('matchingR')
library('microbenchmark')
irving    = utils::getFromNamespace(ns='matchingR','cpp_wrapper_irving')
validate  = utils::getFromNamespace(ns='matchingR','roommate.validate')
checkPref = utils::getFromNamespace(ns='matchingR','roommate.checkPreferences')
gen.utils = function(n,seed=NULL){
  set.seed(seed)
  x = rnorm(n)
  u = dnorm(outer(x,x,`-`))
}
my.validate = function(u){
  n = ncol(u)
  sortIndexOneSided(matrix(u[-seq(1,n^2,n+1)],n-1,n))
}
my.checkpref <- function(pref) {
  # check if pref is using R instead of C++ indexing
  if (all(apply(rbind(pref, seq_len(NCOL(pref))), 2, sort) ==
    matrix(seq_len(NCOL(pref)), nrow = NCOL(pref), ncol = NCOL(pref)))) {
    return(pref - 1)
  }
  comp <- array(seq_len(NROW(pref)), dim = dim(pref)) - upper.tri(pref)
  # check if pref has a complete listing otherwise given an error
  if (all(apply(pref, 2, sort) == comp)) {
    return(pref)
  }
  return(NULL)
}
microbenchmark(
  # validate(u),my.validate(u),my.checkpref(my.validate(u)), # VALIDATION
  roommate(u=u),irving(validate(u))+1,irving(my.validate(u))+1, # FULL MODEL
setup={u=gen.utils(100,seed=0)},
check='equal',
times=10)

output

Loading required package: Rcpp
Unit: microseconds
                       expr      min       lq     mean    median       uq      max neval
            roommate(u = u) 5937.956 6142.166 6575.959 6349.9525 6501.501 8439.542    10
    irving(validate(u)) + 1 6113.019 6245.212 6938.835 6516.9050 7663.952 8583.290    10
 irving(my.validate(u)) + 1  855.716  866.679 1161.646  884.0885  958.542 3303.109    10

jessexknight avatar Aug 07 '25 11:08 jessexknight

Thanks for opening this issue!

In fact, when pref is generated from utils by sortIndexOneSided(as.matrix(utils)), I wonder if checkPreferences can always be skipped?

I think this makes sense.

In addition, I'm also fine with adding a validate argument to roommate(...) that defaults to TRUE.

Would you be interested in opening a PR?

jtilly avatar Aug 07 '25 18:08 jtilly

Great - thanks! Yes, I will open a PR, though I'm a bit busy just now so might not be right away.

jessexknight avatar Aug 07 '25 21:08 jessexknight