oddsratio Issue with or

Issue with or_gam

Open rpouillot opened this issue 3 years ago • 1 comments

Hi,

I had to estimate some odds-ratios from gam models and found your package. After some tries, I found that the confidence intervals were unexpectedly small. I checked the code of your or_gam function and found that the standard errors (se) of the odds-ratios (in the "link" scale) were estimated as the difference between the se of the estimates (since pred_gam2_ci_low <- pred_gam1[1] - (2 * pred_gam1[2]) - pred_gam2[1] - (2 * pred_gam2[2]) which is equivalent to pred_gam2_ci_low <- (pred_gam1[1] - pred_gam2[1]) - (2 * (pred_gam1[2] - pred_gam2[2])) ) and I think it is not statistically sound. The variance of the difference in the estimates is actually equal to the sum of the variances minus twice their covariances.

I can illustrates the issue as following.

library(oddsratio)
library(mgcv)
library(MASS)

#####################################################
# CHECK GLM

fit_glm <- glm(admit ~ gre + gpa + rank,
               data = data_glm,
               family = "binomial"
) # fit model

or_glm(data = data_glm, model = fit_glm, incr = list(gre = 380, gpa = 4))

#  predictor oddsratio ci_low (2.5) ci_high (97.5)          increment
#1       gre     2.364        1.054          5.396                380
#2       gpa    24.932        1.899        349.524                  4
#3     rank2     0.509        0.272          0.945 Indicator variable
#4     rank3     0.262        0.132          0.512 Indicator variable
#5     rank4     0.212        0.091          0.471 Indicator variable

# MASS equivalence
data_glm$greScaled <- data_glm$gre / 380  
data_glm$gpaScaled <- data_glm$gpa  / 4  

fit_glmScaled <- glm(admit ~ greScaled + gpaScaled + rank,
               data = data_glm,
               family = "binomial"
) # fit model Scaled

# USING MASS function: we got the same results
exp(cbind(coef(fit_glmScaled), confint(fit_glmScaled)))

#Waiting for profiling to be done...
#                             2.5 %      97.5 %
#(Intercept)  0.0185001 0.001889165   0.1665354
#greScaled    2.3642995 1.053676012   5.3958610
#gpaScaled   24.9319521 1.898727216 349.5235388
#rank2        0.5089310 0.272289674   0.9448343
#rank3        0.2617923 0.131641717   0.5115181
#rank4        0.2119375 0.090715546   0.4706961

# No issue here.

#####################################################
# CHECK GAM, without smoothing: this is "similar" to a classical glm

fit_gamAsGLM <- gam(admit ~ gre + gpa + rank,
               data = data_glm,
               family = "binomial"
) 

or_gam(data = data_glm, model = fit_gamAsGLM, pred = "gre", values = c(0,380))
#  predictor value1 value2 oddsratio CI_low (2.5%) CI_high (97.5%)
#1       gre      0    380    2.3643      1.146592         4.87524

or_gam(data = data_glm, model = fit_gamAsGLM, pred = "gpa", values = c(0,4))
#  predictor value1 value2 oddsratio CI_low (2.5%) CI_high (97.5%)
#1       gpa      0      4  24.93195      4.042141        153.7804

# We should get results similar as `or_glm`. The CIs from or_gam are (sometime largely) underestimated.

Here is a proposal to solve the issue. It is based on help('predict.gam') example to estimate the variance of sum of predictions using lpmatrix. Here, we want the variance of a difference but it is straightforward.

# Evaluate "other parameters"
meanGre <- mean(data_glm$gre)
meanGpa <- mean(data_glm$gpa)
# Any rank is valid, because it is not smoothed (additive effect)
AnyRank <- data_glm$rank[1]

# Build a test data with values
testdatagre <- data.frame(gre = c(0, 380),
                          gpa = meanGpa, rank = AnyRank)

testdatagpa <- data.frame(gre = meanGre,
                          gpa = c(0, 4), rank = AnyRank)


# This function provides Or and ci for odds ratio.
# Note: it could easily be vectorised for your 'slice' option
# (just increase the size of `testdata`. The results will be relatively to the first line)

or_gam2 <- function(model, testdata){
  n <- nrow(testdata)
  Xp <- predict(model, testdata, type="lpmatrix")
  a <- cbind(-1, diag(n-1))
  Xs <- a %*% Xp
  # log Odds-Ratios
  LnOR <- Xs %*% coef(model)
  OR <- exp(LnOR) # gives OR
  # variance of the prediction
  var.LnOR <- Xs %*% model$Vp %*% t(Xs)
  sd.LnOR <- sqrt(diag(var.LnOR))
  return(c(OR=OR, 
           lower= exp(LnOR-1.96*sd.LnOR), 
           upper= exp(LnOR+1.96*sd.LnOR)))
}

or_gam2(fit_gamAsGLM, testdatagre)

#      OR    lower    upper 
# 2.364300 1.046731 5.340350 

or_gam2(fit_gamAsGLM, testdatagpa)
#        OR      lower      upper 
#24.931952   1.849078 336.168826

We get much closer CIs. Note that they are not strictly the same because the se estimations in the gam function are completely different than in the glm function. Here is an example with real gam

fit_gam <- gam(admit ~ s(gre) + s(gpa) + rank,
               data = data_glm,
               family = "binomial"
) # fit model

or_gam(data = data_glm, model = fit_gam, pred = "gpa", values=c(3.4,3.5))

# predictor value1 value2 oddsratio CI_low (2.5%) CI_high (97.5%)
#1       gpa    3.4    3.5  1.219184      1.211133        1.227289

# The CIs are extremely tights.  

testdatagpa <- data.frame(gre = meanGre,
                          gpa = c(3.4,3.5), rank = AnyRank)

or_gam2(fit_gam, testdatagpa)
#      OR    lower    upper 
#1.219184 1.026426 1.448141

This function could simply be adapted to your framework. Hope it helps. Let me know what you think.

Mar 27 '21 14:03 rpouillot

oddsratio oddsratio copied to clipboard

Issue with or_gam

oddsratio
oddsratio copied to clipboard