simr
simr copied to clipboard
Influence of one NA cell and available options
Hello,
I would like to report a possible bug, and I would be very grateful for advice to work around it. The issue regards missing data, and I have not found it addressed in the present way previously (#172, #143, #117, #71).
I've noticed that a minimal presence of NA cells (indeed, a single cell) has a large influence on powerCurve()
and powerSim()
. Please find a minimal, reproducible example below, using the cbpp
data set.
For the examples with both functions, I firstly introduced one NA
in a predictor (i.e., period
). Then, I ran either powerCurve
or powerSim
using another predictor, fixed('incidence')
. In both cases, the results were extremely influenced by the NA cell, especially as the number of rows was unidentified. Please note that the results are similar (especially lacking the number of rows) if fixed('period')
is used instead.
On the next step, the NA
was replaced with a valid value. In contrast to the above models, the functions now took longer to run, the number of rows was identified on the results, and the power determined was higher.
My sessionInfo
is available at the end.
Question
Besides replacing or removing all the missing data within the predictors, could there be any other workarounds?
Thank you very much for your attention
powerCurve
> cbpp[1, 'period'] = NA
>
>
> cbpp
herd incidence size period
1 1 2 14 <NA>
2 1 3 12 2
3 1 4 9 3
4 1 0 5 4
5 2 3 22 1
6 2 1 18 2
7 2 1 21 3
8 3 8 22 1
9 3 2 16 2
10 3 0 16 3
11 3 2 20 4
12 4 2 10 1
13 4 0 10 2
14 4 2 9 3
15 4 0 6 4
16 5 5 18 1
17 5 0 25 2
18 5 0 24 3
19 5 1 4 4
20 6 3 17 1
21 6 0 17 2
22 6 0 18 3
23 6 1 20 4
24 7 8 16 1
25 7 1 10 2
26 7 3 9 3
27 7 0 5 4
28 8 12 34 1
29 9 2 9 1
30 9 0 6 2
31 9 0 8 3
32 9 0 6 4
33 10 1 22 1
34 10 1 22 2
35 10 0 18 3
36 10 2 22 4
37 11 0 25 1
38 11 5 27 2
39 11 3 22 3
40 11 1 22 4
41 12 2 10 1
42 12 1 8 2
43 12 0 6 3
44 12 0 5 4
45 13 1 21 1
46 13 2 24 2
47 13 0 19 3
48 13 0 23 4
49 14 11 19 1
50 14 0 2 2
51 14 0 3 3
52 14 0 2 4
53 15 1 19 1
54 15 1 15 2
55 15 1 15 3
56 15 0 15 4
>
>
> gm1 <- glmer(size ~ period*incidence + (1 | herd),
+ data = cbpp, family = poisson
+ )
>
>
> test_powerCurve = powerCurve(gm1, fixed('incidence'), along = 'herd', nsim=100)
Calculating power at 10 sample sizes along herd
Warning message:ng: |=========================================================================================================================|
In observedPowerWarning(sim) :
This appears to be an "observed power" calculation
>
>
> test_powerCurve
Power for predictor 'incidence', (95% confidence interval),
by number of levels in herd:
3: 0.00% ( 0.00, 3.62) - 0 rows
4: 0.00% ( 0.00, 3.62) - 0 rows
6: 0.00% ( 0.00, 3.62) - 0 rows
7: 0.00% ( 0.00, 3.62) - 0 rows
8: 0.00% ( 0.00, 3.62) - 0 rows
10: 0.00% ( 0.00, 3.62) - 0 rows
11: 0.00% ( 0.00, 3.62) - 0 rows
12: 0.00% ( 0.00, 3.62) - 0 rows
14: 0.00% ( 0.00, 3.62) - 0 rows
15: 0.00% ( 0.00, 3.62) - 0 rows
Time elapsed: 0 h 0 m 12 s
>
>
> # Now, replace the NA with a valid value
> cbpp[1, 'period'] = 2
>
>
> test_powerCurve = powerCurve(gm1, fixed('incidence'), along = 'herd', nsim=100)
Calculating power at 10 sample sizes along herd
( 1/10) Simulating: | ( 1/10) Simulating: |= ( 1/10) Simulating: |=== ( 1/10) Simulating: |==== ( 1/10) Simulating: |====== ( 1/10) Simulating: |======= ( 1/10) Simulating: |======== ( 1/10) Simulating: |========== ( 1/10) Simulating: |=============== ( 1/10) Simulating: |================ ( 1/10) Simulating: |================== ( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 1/10) Simulating: |====================( 2/10) Simulating: |= ( 2/10) Simulating: |==== ( 2/10) Simulating: |========== ( 2/10) Simulating: |=============== ( 2/10) Simulating: |================ ( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 2/10) Simulating: |====================( 3/10) Simulating: |====================( 3/10) Simulating: |====================Warning message:ng: |=========================================================================================================================|
In observedPowerWarning(sim) :
This appears to be an "observed power" calculation
>
>
> test_powerCurve
Power for predictor 'incidence', (95% confidence interval),
by number of levels in herd:
3: 31.00% (22.13, 41.03) - 11 rows
4: 34.00% (24.82, 44.15) - 15 rows
6: 32.00% (23.02, 42.08) - 23 rows
7: 44.00% (34.08, 54.28) - 27 rows
8: 56.00% (45.72, 65.92) - 28 rows
10: 71.00% (61.07, 79.64) - 36 rows
11: 82.00% (73.05, 88.97) - 40 rows
12: 82.00% (73.05, 88.97) - 44 rows
14: 98.00% (92.96, 99.76) - 52 rows
15: 98.00% (92.96, 99.76) - 56 rows
Time elapsed: 0 h 2 m 46 s
powerSim
> cbpp[1, 'period'] = NA
>
>
> cbpp
herd incidence size period
1 1 2 14 <NA>
2 1 3 12 2
3 1 4 9 3
4 1 0 5 4
5 2 3 22 1
6 2 1 18 2
7 2 1 21 3
8 3 8 22 1
9 3 2 16 2
10 3 0 16 3
11 3 2 20 4
12 4 2 10 1
13 4 0 10 2
14 4 2 9 3
15 4 0 6 4
16 5 5 18 1
17 5 0 25 2
18 5 0 24 3
19 5 1 4 4
20 6 3 17 1
21 6 0 17 2
22 6 0 18 3
23 6 1 20 4
24 7 8 16 1
25 7 1 10 2
26 7 3 9 3
27 7 0 5 4
28 8 12 34 1
29 9 2 9 1
30 9 0 6 2
31 9 0 8 3
32 9 0 6 4
33 10 1 22 1
34 10 1 22 2
35 10 0 18 3
36 10 2 22 4
37 11 0 25 1
38 11 5 27 2
39 11 3 22 3
40 11 1 22 4
41 12 2 10 1
42 12 1 8 2
43 12 0 6 3
44 12 0 5 4
45 13 1 21 1
46 13 2 24 2
47 13 0 19 3
48 13 0 23 4
49 14 11 19 1
50 14 0 2 2
51 14 0 3 3
52 14 0 2 4
53 15 1 19 1
54 15 1 15 2
55 15 1 15 3
56 15 0 15 4
>
>
> test_powerSim = powerSim(gm1, fixed('incidence'), nsim=100)
Warning message:====================================================================================================================================================|
In observedPowerWarning(sim) :
This appears to be an "observed power" calculation
>
>
> test_powerSim
Power for predictor 'incidence', (95% confidence interval):
0.00% ( 0.00, 3.62)
Test: z-test
Effect size for incidence is 0.079
Based on 100 simulations, (100 warnings, 100 errors)
alpha = 0.05, nrow = NA
Time elapsed: 0 h 0 m 4 s
nb: result might be an observed power calculation
>
>
> # Now, replace the NA with a valid value
> cbpp[1, 'period'] = 2
>
>
> test_powerSim = powerSim(gm1, fixed('incidence'), nsim=100)
Warning message:====================================================================================================================================================|
In observedPowerWarning(sim) :
This appears to be an "observed power" calculation
>
>
> test_powerSim
Power for predictor 'incidence', (95% confidence interval):
95.00% (88.72, 98.36)
Test: z-test
Effect size for incidence is 0.079
Based on 100 simulations, (5 warnings, 0 errors)
alpha = 0.05, nrow = 56
Time elapsed: 0 h 0 m 24 s
nb: result might be an observed power calculation
Session info
> sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] simr_1.0.5 lme4_1.1-26 Matrix_1.3-2
loaded via a namespace (and not attached):
[1] statmod_1.4.35 tidyselect_1.1.0 purrr_0.3.4 splines_4.0.4 haven_2.3.1 lattice_0.20-41 carData_3.0-4 vctrs_0.3.6
[9] generics_0.1.0 mgcv_1.8-34 utf8_1.2.1 rlang_0.4.10 nloptr_1.2.2.2 pillar_1.5.1 foreign_0.8-81 glue_1.4.2
[17] DBI_1.1.1 readxl_1.3.1 plyr_1.8.6 binom_1.1-1 lifecycle_1.0.0 stringr_1.4.0 cellranger_1.1.0 zip_2.1.1
[25] rio_0.5.26 forcats_0.5.1 RLRsim_3.1-6 parallel_4.0.4 pbkrtest_0.5.1 curl_4.3 fansi_0.4.2 broom_0.7.5
[33] Rcpp_1.0.6 backports_1.2.1 plotrix_3.8-1 abind_1.4-5 hms_1.0.0 stringi_1.5.3 openxlsx_4.2.3 dplyr_1.0.5
[41] grid_4.0.4 tools_4.0.4 magrittr_2.0.1 tibble_3.1.0 tidyr_1.1.3 crayon_1.4.1 car_3.0-10 pkgconfig_2.0.3
[49] MASS_7.3-53.1 ellipsis_0.3.1 data.table_1.14.0 assertthat_0.2.1 minqa_1.2.4 iterators_1.0.13 R6_2.5.0 boot_1.3-27
[57] nlme_3.1-152 compiler_4.0.4
At this point na.omit
is about all you can do. Thanks for the detailed report though, it will make it easier to fix when I do find the time.
Thank you -- that's alright.
Best regards