simr icon indicating copy to clipboard operation
simr copied to clipboard

Taking too much time to run the code

Open sangheek16 opened this issue 3 years ago • 6 comments

Hi,

I've extended the model along participant number and used powerCurve with nsim=100. It's been taking me more than 72 hrs, and the code is still running. I've tried the same code with nsim=10, and it took me 5 min. I've also tried the same code with nsim=20, and this took me 11hrs 25min.

Is this normal? Does the calculation time increases exponentially by the nsim value?

Below is the model I used. logRT is a continuous variable. Grammaticality.f, Distractor.f, and Clause.f are categorical with two levels, respectively. RT2 and WordLength are continuous variables.

model <- lmer(logRT ~ Grammaticality.f*Distractor.f*Clause.f+RT2+WordLength+ (Distractor.f+Clause.f+Grammaticality.f|Participant) + (Distractor.f+Clause.f+Grammaticality.f|Item), data=so1)

I've extended the model along the number of participants.

model_ext <- extend(model, along="Participant", n=1000)

I then used the powerCurve with nsim=100.

pc <- powerCurve(model_ext, test=fixed("Grammaticality.f1:Distractor.f1:Clause.f1", method="z"), along="Participant", nsim=100, breaks=c(100,200,300,400,500,600,700,800,900,1000))

I wasn't sure if this is an issue with my code, or whether I should expect the running time to increase exponentially as I increase the value of nsim.

Thanks for your help!

sangheek16 avatar Aug 05 '21 12:08 sangheek16

I would normally expect it to increase linearly.

Does powerSim for a single sample size scale properly for you?

powerCurve does all the simulations at the start so it's possible this is a memory issue. How many observations per participant in your data?

pitakakariki avatar Aug 06 '21 00:08 pitakakariki

Thanks for your help!

Just to clarify your first question, could you explain more what you mean by "a single sample size" and "scale properly"?

If your second question is about the number of rows per participant, there are 46 rows in the data frame.

sangheek16 avatar Aug 06 '21 12:08 sangheek16

If you use powerSim instead of powerCurve, can you increase the number of simulations without the time increasing unreasonably?

pitakakariki avatar Aug 06 '21 23:08 pitakakariki

I would say the time increases quite reasonably if I use powerSim. nsim=20 took me 1 h 28 m 13 s, and nsim=100 took me 7 h 46 m 54 s. For your reference, the model powerCurve with nsim=20 took me 11 h and more than 72 h (couldn't finish calculating) with nsim=100. Would sometime be going on with powerCurve?

sangheek16 avatar Aug 10 '21 02:08 sangheek16

Sounds like it's a memory thing - the package was designed for ecologists so I didn't expect people to run such large models. You'll need to run each sample size with powerSim I think.

pitakakariki avatar Aug 10 '21 02:08 pitakakariki

I'll work with powerSim with each sample size. Thanks a lot for your help, really appreciate it!

sangheek16 avatar Aug 10 '21 02:08 sangheek16