dabestr icon indicating copy to clipboard operation
dabestr copied to clipboard

Issue in mean_diff(p) - wrong degrees of freedom & the CI

Open Generalized opened this issue 3 years ago • 5 comments

It seems the mean_diff takes incorrect degrees of freedom: n rather than n-1 to calculate the quantile of t.

Data:

d <- structure(list(change = c(-3, -3, 3, 3, -3, -3, 25, 25, -8, -8, 
2, 2, 24, 24, 2, 2, 10, 10, 24, 24, 0, 0, 37, 37), Timepoint = c("T1", 
"T2", "T1", "T2", "T1", "T2", "T1", "T2", "T1", "T2", "T1", "T2", 
"T1", "T2", "T1", "T2", "T1", "T2", "T1", "T2", "T1", "T2", "T1", 
"T2"), Result = c(35, 32, 81, 84, 69, 66, 82, 107, 84, 76, 108, 
110, 75, 99, 57, 59, 102, 112, 68, 92, 76, 76, 75, 112), Id = c(1L, 
1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L, 9L, 
9L, 10L, 10L, 11L, 11L, 12L, 12L)), row.names = c(NA, -24L), class = c("tbl_df", 
"tbl", "data.frame"))

d_wide <- structure(list(Id = 1:12, T1 = c(35, 81, 69, 82, 84, 108, 75, 
57, 102, 68, 76, 75), T2 = c(32, 84, 66, 107, 76, 110, 99, 59, 
112, 92, 76, 112), change = c(-3, 3, -3, 25, -8, 2, 24, 2, 10, 
24, 0, 37)), row.names = c(NA, -12L), class = c("tbl_df", "tbl", 
"data.frame"))

There are 12 rows.

Paired t: 11 rows (n-1)

> t.test(Result ~ Timepoint, d, paired=TRUE)

	Paired t-test

data:  Result by Timepoint
t = -2.2653, df = 11, p-value = 0.04467
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -18.5659033  -0.2674301
sample estimates:
mean of the differences 
              -9.416667 

t on change (equivalent to the above)

> t.test(d_wide$change)

	One Sample t-test

data:  d_wide$change
t = 2.2653, df = 11, p-value = 0.04467
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
  0.2674301 18.5659033
sample estimates:
mean of x 
 9.416667 

obraz

Now, dabestr: n=12 (that's fine for the mean itself, but the qt needs 11), the confidence interval does not match.

....
Dataset    :  d
X Variable :  Timepoint
Y Variable :  Result

Paired mean difference of T2 (n = 12) minus T1 (n = 12)
 9.42 [95CI  -8.25; 25.2]
.....

And the plot is wrong: obraz

Also I checked the bootstrapped version, because you use bootstrap, and the CI differs and definitely does not cover zero. I used the same seed: 12345. It's understandable, that results of those calculation may fluctuate a bit, but the current one goes too far and it will be rather difficult to justify it against statistical reviewers, when publishing it.

> set.seed(12345)
> boot.ci(boot(d_wide %>% data.frame, 
                              statistic = function(data, indices) 
                                                return(mean(data[indices, "T2"]) - mean(data[indices, "T1"])), R=10000))
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 10000 bootstrap replicates

CALL : 
boot.ci(boot.out = boot(d_wide %>% data.frame, statistic = function(data, 
    indices) return(mean(data[indices, "T2"]) - mean(data[indices, 
    "T1"])), R = 10000))

Intervals : 
Level      Normal              Basic         
95%   ( 1.646, 17.262 )   ( 1.335, 16.831 )  

Level     Percentile            BCa          
95%   ( 2.002, 17.498 )   ( 2.583, 18.417 )  
Calculations and Intervals on Original Scale
Warning message:

I use version 0.3.0.

Generalized avatar May 25 '21 07:05 Generalized

Paired computations bug is reported previously in #94 , thanks for bumping this up.

At first pass, I don't think this is a degree of freedom problem, but how the paired bootstrap is computed.

josesho avatar May 25 '21 08:05 josesho

Yes. Initially I thought it's calculated traditionally. Then I found it's done via bootstrap, so it's not the DF fault. Today morning my work was rejected by a statistical reviewer at my client due to this issue, as they used SAS to validate the outcome.

Generalized avatar May 25 '21 09:05 Generalized

That's a bummer! I apologise. We do have an intern who will be working on this.

josesho avatar May 25 '21 10:05 josesho

@josesho Hello, is the corrected version published on the CRAN? I'd like to propose dabestr to my friend, a researcher, but I'm not sure if the error in computing correct CI and df was finally solved and released?

Generalized avatar Sep 16 '21 12:09 Generalized

Hi @Generalized ,

It's not on CRAN yet but you can install the dev branch from Github with

devtools::install_github("ACCLAB/[email protected]")

josesho avatar Sep 17 '21 03:09 josesho