xrf icon indicating copy to clipboard operation
xrf copied to clipboard

Gracefully handle NAs in predictors

Open holub008 opened this issue 5 years ago • 0 comments

Currently, if the response contains an NA, a clear error message is thrown:

data <- data.frame(x = rnorm(50), y = c(rnorm(49), NA))
m <- xrf(y ~x, data, family = 'gaussian', xgb_control = list(nrounds=1, max_depth=2))

Error in xrf_preconditions(family, xgb_control, glm_control, data, response_var,  : 
  Response variable contains missing values which is not allowed

However, if any predictor contains an NA, the *model.matrix implementation will silently drop the row, which results in confusing errors:

data <- data.frame(y = rnorm(50), x = c(rnorm(49), NA))
m <- xrf(y ~x, data, family = 'gaussian', xgb_control = list(nrounds=1, max_depth=2))

Error in setinfo.xgb.DMatrix(dmat, names(p), p[[1]]) : 
  The length of labels must equal to the number of rows in the input data

Several fixes may make sense:

  • Fail fast & clearly with a preconditions check
  • Offer several (configurable) remediation methods, like dropping offending rows or mean/mode imputation.

holub008 avatar May 04 '19 03:05 holub008