SimDesign icon indicating copy to clipboard operation
SimDesign copied to clipboard

Implement MCSE for performance measure functions

Open bsiepe opened this issue 1 year ago • 2 comments

Hello,

as outlined in #27 , Samuel Pawel, František Bartoš, and I would be willing to help with the implementation of some MCSE-related functionality for SimDesign.

Our suggestions: Allow users to obtain CLT-based Monte Carlo Standard Errors for the performance measures in the SimDesign summary functions based on the formulas provided in Siepe et al. (2023).

Sketch of what such an extension could look like:

(building on the existing EDR function):

mcse_prop <- function(x, n){
  sqrt((1/n) * (x * (1 - x)))
}

EDR <- function(p,
            	alpha = .05,
            	unname = FALSE,
            	MCSE = TRUE){
 
  browser()
  if(is.data.frame(p)) p <- as.matrix(p)
  stopifnot(all(p <= 1 & p >= 0))
  stopifnot(length(alpha) == 1L)
  stopifnot(alpha <= 1 && alpha >= 0)
  if(is.vector(p)) p <- matrix(p)
  ret <- colMeans(p <= alpha)
 
  if(MCSE){
	mcse <- mcse_prop(ret, n = nrow(p))
	if(!is.null(names(ret))){
  	names(mcse) <- paste0(names(ret), ".MCSE")
	} else {
  	names(ret) <- paste0("V", 1:length(ret))
  	names(mcse) <- paste0("MCSE.V", 1:length(ret))
	}
	ret <- c(ret, mcse)
  }
 
  if(unname) ret <- unname(ret)
  ret
}

Open Questions/Ideas:

  • Should the additional MCSE argument only allow for the CLT-based approach, or should this somehow also be tied to bootstrap-based MCSE estimation as available in reSummarize() or runSimulation()?
  • The named vector output could potentially lead to confusion; another possibility would be to provide the MCSE as an attribute

Depending on your input, we will open a pull request suggesting the functions soon.

bsiepe avatar Feb 07 '24 09:02 bsiepe

This is nice, though in general I would suggest leaving MSCE = FALSE by default so as to not break the existing codebase. As a compromise, a family of wrappers could be included such as

EDR_MCSE <- function(...) bias(..., MCSE = TRUE)

so that it's clear both the EDR and its respective SE are being returned. Note that lines like

} else {
  	names(ret) <- paste0("V", 1:length(ret))
  	names(mcse) <- paste0("MCSE.V", 1:length(ret))
	}

generally go against the package philosophy (I would much rather have the workflow throw an error message that vectors have missing names than to name objects automatically for users with the hopes that they remember what was being applied). Naming objects is hard, particularly on this side of the workflow fence, so simulation designers should be forced to remove the ambiguity themselves.

Open Questions/Ideas:

  • Should the additional MCSE argument only allow for the CLT-based approach, or should this somehow also be tied to bootstrap-based MCSE estimation as available in reSummarize() or runSimulation()?

Interesting point. I would likely keep these separate as they may only be of interest for some statistical summaries rather than others. The default boot_method bootstraps all statistics reported in Summarise(), but this may be considered overkill.

  • The named vector output could potentially lead to confusion; another possibility would be to provide the MCSE as an attribute

Agreed; I'm generally not a fan of this either. One thought would be to have the function like EDR() return the usual behaviour when MCSE=FALSE, but when set to TRUE will return the MC SEs instead (but omitting the detection rates themselves). This way all the information would carry though, and if users want the explicit MCSE information it would amount to making calls such as

rates <- EDR(...)
SE.rates <- EDR(..., MCSE=TRUE)

at which point they can manipulate the objects independently in different contexts. That would still play well with the initial EDR_MSCE() definition above, where it's clear from context that both estimates and their SEs are returned simultaneously.

philchalmers avatar Feb 10 '24 03:02 philchalmers

Thank you for your response and your comments. Some brief responses:

  1. We agree, MCSE = FALSE should be the default.
  2. Good point. We will not name variables automatically, but rather add an error in case the variables are unnamed and an MCSE is computed
  3. Agreed, we will keep the MCSE separate from the bootstrap-based MCSE.
  4. We like your idea of creating a family of wrappers, such as EDR_MCSE(), and letting the function EDR() only return the MCSE (and omitting the EDR itself) when calling EDR(..., MCSE = TRUE). We will follow this route.

We will prepare a pull request incorporating this functionality.

bsiepe avatar Feb 21 '24 13:02 bsiepe