mixtools icon indicating copy to clipboard operation
mixtools copied to clipboard

Predicting components for new data?

Open mschubert opened this issue 4 years ago • 8 comments

Hi,

Thank you for this useful package!

I'm using mixtools to fit a Normal mixture model similar to the vignette example:

library(mixtools)
data(faithful)
attach(faithful)

wait1 = normalmixEM(waiting, mu=c(50, 70))

head(wait1$posterior, 2)
#              comp.1       comp.2
#   [1,] 1.023874e-04 9.998976e-01
#   [2,] 9.999089e-01 9.109251e-05

Is there a way to predict the posterior component likelihood for new data (analogous to e.g. predict on an lm model)?

predict(wait1, newdata=data.frame(waiting=c(42,44,61)))
# Error in UseMethod("predict") :
#   no applicable method for 'predict' applied to an object of class "mixEM"
mod = lm(eruptions ~ waiting, data=faithful)
predict(mod, newdata=data.frame(waiting=c(45,77)))

mschubert avatar Jul 13 '21 18:07 mschubert

I am happy to hear that mixtools has been useful for your research, @mschubert!

Unfortunately, we have not yet written an S3 method for prediction on an object of class mixEM. With that said, the following will accomplish what you are asking, at least for the example that you provided:

library(mixtools)
data(faithful)
attach(faithful)

set.seed(1)

wait1 <- normalmixEM(waiting, mu = c(50, 70))

#Assuming newdata is a numeric vector
pred.fn <- function(EMout, newdata){
  out <- t(sapply(1:length(newdata), function(i) EMout$lambda*dnorm(newdata[i], mean = EMout$mu, sd = EMout$sigma)))
  out <- out/apply(out, 1, sum)
  rownames(out) <- newdata
  colnames(out) <- c(paste("comp", ".", 1:length(EMout$lambda), sep = ""))
  return(out)
}

pred.fn(wait1, newdata = c(42, 44, 61))

#      comp.1       comp.2
#42 1.0000000 1.259483e-08
#44 0.9999999 5.536425e-08
#61 0.9841608 1.583920e-02

Note in the above that I have assumed that the input for newdata is a numeric vector and not a data frame.

dsy109 avatar Jul 15 '21 20:07 dsy109

Thank you very much for your quick answer! Your code example helps me to solve my immediate task.

For myself (and probably others) it would be great if you could also implement a predict function in mixtools eventually.

mschubert avatar Jul 16 '21 09:07 mschubert

Thanks @mschubert for the question and suggestion. @dsy109 , maybe it would be good to add some S3 methods to the package as suggested. Something to keep in mind!

drh20drh20 avatar Jul 16 '21 12:07 drh20drh20

Glad it worked for your immediate task, @mschubert.

@drh20drh20, it will be good to add such an S3 method for predict. I will add it to my list. Note that there are currently some S3 methods, namely for density, print, and summary.

dsy109 avatar Jul 16 '21 13:07 dsy109

@dsy109 I may have access to an undergraduate researcher. Do you want me to put them on it? I'm also happy to defer to you; your call.

drh20drh20 avatar Jul 16 '21 13:07 drh20drh20

@drh20drh20 I am fine with that arrangement. @Kedai-Cheng is working on a laundry list of items for mixtools updates, but he is in the midst of doing major overhauling of the graphics. So wrapping an undergraduate researcher in for this would be helpful.

dsy109 avatar Jul 16 '21 15:07 dsy109

Wonderful. I will try to get the ball rolling on it and let you know if I have any issues.

drh20drh20 avatar Jul 16 '21 16:07 drh20drh20

Status update: I have an undergraduate working on this; we don't have an estimated completion date.

drh20drh20 avatar Sep 19 '21 14:09 drh20drh20