survminer icon indicating copy to clipboard operation
survminer copied to clipboard

ggforest missing values

Open jarbet opened this issue 2 years ago • 1 comments

Expected behavior

For coxph, if a predictor has missing values, those subjects are excluded by default. Thus for ggforest, I would expect the sample size to show the number of non-missing values used in the model.

Actual behavior

The sample size in the ggforest plot includes patients that had missing values, despite the fact that these patients are excluded from coxph by default.

Steps to reproduce the problem

library(survival)
library(survminer)

data(cancer, package="survival")

# subset to variables of interest
colon = subset(colon, select = c(sex, adhere, time, status))

# add missing values to sex
colon$sex[1:10] = NA

# add missing values to adhere
colon$adhere[15] = NA

# N = 1858 when including missing values
nrow(colon)

# N = 1847 when excluding missing values (10 missing sex, 1 missing adhere)
nrow(na.omit(colon))

model <- coxph( Surv(time, status) ~ sex + adhere, data = colon )

# "n= 1847...11 observations deleted due to missingness"
print(model)

# shows N=1858 despite only N=1847 being used in model
ggforest(model)

Note that ggforest shows N=1858 despite only N=1847 being used in model:

image

session_info()

R version 4.2.0 (2022-04-22) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Monterey 12.5.1

survminer v0.4.9

jarbet avatar Sep 28 '22 17:09 jarbet

An easy solution I found was to use model=TRUE in coxph. Then when calling ggforest, use data = fit$model. Then the sample sizes will be the number of non-missing values.

jarbet avatar Nov 28 '22 21:11 jarbet