survminer
survminer copied to clipboard
ggforest missing values
Expected behavior
For coxph
, if a predictor has missing values, those subjects are excluded by default. Thus for ggforest
, I would expect the sample size to show the number of non-missing values used in the model.
Actual behavior
The sample size in the ggforest
plot includes patients that had missing values, despite the fact that these patients are excluded from coxph
by default.
Steps to reproduce the problem
library(survival)
library(survminer)
data(cancer, package="survival")
# subset to variables of interest
colon = subset(colon, select = c(sex, adhere, time, status))
# add missing values to sex
colon$sex[1:10] = NA
# add missing values to adhere
colon$adhere[15] = NA
# N = 1858 when including missing values
nrow(colon)
# N = 1847 when excluding missing values (10 missing sex, 1 missing adhere)
nrow(na.omit(colon))
model <- coxph( Surv(time, status) ~ sex + adhere, data = colon )
# "n= 1847...11 observations deleted due to missingness"
print(model)
# shows N=1858 despite only N=1847 being used in model
ggforest(model)
Note that ggforest
shows N=1858 despite only N=1847 being used in model:
session_info()
R version 4.2.0 (2022-04-22) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Monterey 12.5.1
survminer v0.4.9
An easy solution I found was to use model=TRUE
in coxph
. Then when calling ggforest
, use data = fit$model
. Then the sample sizes will be the number of non-missing values.