BMA
BMA copied to clipboard
Fails with missing data
The main function fails when there is missing data. The error message is not informative. I suggest you automatically subset to complete cases, as many other packages do. You can throw a warning when missing data was removed. Check out how the functions in rms package does this.
library(tidyverse)
library(BMA)
#> Loading required package: survival
#> Loading required package: leaps
#> Loading required package: robustbase
#>
#> Attaching package: 'robustbase'
#> The following object is masked from 'package:survival':
#>
#> heart
#> Loading required package: inline
#> Loading required package: rrcov
#> Scalable Robust Estimators with High Breakdown Point (version 1.5-5)
#versions
sessionInfo()
#> R version 4.0.3 (2020-10-10)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Linux Mint 19.3
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] BMA_3.18.14 rrcov_1.5-5 inline_0.3.17 robustbase_0.93-7
#> [5] leaps_3.1 survival_3.2-7 forcats_0.5.0 stringr_1.4.0
#> [9] dplyr_1.0.2 purrr_0.3.4 readr_1.4.0 tidyr_1.1.2
#> [13] tibble_3.0.4 ggplot2_3.3.3 tidyverse_1.3.0
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.5 lubridate_1.7.9.2 mvtnorm_1.1-1 lattice_0.20-41
#> [5] assertthat_0.2.1 digest_0.6.27 R6_2.5.0 cellranger_1.1.0
#> [9] backports_1.2.1 pcaPP_1.9-73 reprex_0.3.0.9001 stats4_4.0.3
#> [13] evaluate_0.14 httr_1.4.2 highr_0.8 pillar_1.4.7
#> [17] rlang_0.4.10 readxl_1.3.1 rstudioapi_0.13 Matrix_1.3-2
#> [21] rmarkdown_2.6 splines_4.0.3 munsell_0.5.0 broom_0.5.6
#> [25] compiler_4.0.3 modelr_0.1.8 xfun_0.20 pkgconfig_2.0.3
#> [29] htmltools_0.5.1 tidyselect_1.1.0 fansi_0.4.2 crayon_1.3.4
#> [33] dbplyr_2.0.0 withr_2.4.0 grid_4.0.3 nlme_3.1-151
#> [37] jsonlite_1.7.2 gtable_0.3.0 lifecycle_0.2.0 DBI_1.1.0
#> [41] magrittr_2.0.1 scales_1.1.1 cli_2.2.0 stringi_1.5.3
#> [45] fs_1.5.0 xml2_1.3.2 ellipsis_0.3.1 generics_0.1.0
#> [49] vctrs_0.3.6 tools_4.0.3 glue_1.4.2 DEoptimR_1.0-8
#> [53] hms_0.5.3 yaml_2.2.1 colorspace_2.0-0 rvest_0.3.6
#> [57] knitr_1.30 haven_2.3.1
#model formula
model_form = "displ ~ manufacturer + year + cyl + drv + cty + hwy + fl + class" %>% as.formula()
#fit OK
BMA::bic.glm(model_form, data = mpg, glm.family = "gaussian")
#>
#> Call:
#> bic.glm.formula(f = model_form, data = mpg, glm.family = "gaussian")
#>
#>
#> Posterior probabilities(%):
#> manufacturerchevrolet manufacturerdodge manufacturerford
#> 100.0 100.0 94.9
#> manufacturerhonda manufacturerhyundai manufacturerjeep
#> 12.4 37.4 53.1
#> manufacturerland rover manufacturerlincoln manufacturermercury
#> 1.5 33.4 32.3
#> manufacturernissan manufacturerpontiac manufacturersubaru
#> 57.9 100.0 35.1
#> manufacturertoyota manufacturervolkswagen year
#> 53.1 11.8 100.0
#> drvf drvr cty
#> 9.5 94.2 96.6
#> hwy fld fle
#> 58.7 100.0 100.0
#> flp flr classcompact
#> 9.1 7.3 95.4
#> classmidsize classminivan classpickup
#> 95.4 100.0 95.4
#> classsubcompact classsuv
#> 95.4 95.4
#>
#> Coefficient posterior expected values:
#> (Intercept) manufacturerchevrolet manufacturerdodge
#> -73.293637 0.842160 0.782625
#> manufacturerford manufacturerhonda manufacturerhyundai
#> 0.580392 0.049055 -0.156565
#> manufacturerjeep manufacturerland rover manufacturerlincoln
#> 0.334886 0.003003 0.305475
#> manufacturermercury manufacturernissan manufacturerpontiac
#> 0.208277 0.299841 1.080805
#> manufacturersubaru manufacturertoyota manufacturervolkswagen
#> -0.131918 0.191814 -0.028317
#> year drvf drvr
#> 0.040527 0.033635 0.639857
#> cty hwy fld
#> -0.135739 -0.047947 1.518490
#> fle flp flr
#> -0.837974 -0.017056 0.012708
#> classcompact classmidsize classminivan
#> -1.571618 -1.446988 -2.026317
#> classpickup classsubcompact classsuv
#> -1.464263 -1.501324 -1.271144
#add missing data
mpg[1, 5] = NA
#fit fail
BMA::bic.glm(model_form, data = mpg, glm.family = "gaussian")
#> Error in model.frame.default(formula = y ~ ., data = x.df, weights = wt, : variable lengths differ (found for 'x.manufacturerchevrolet')
Created on 2021-01-18 by the reprex package (v0.3.0.9001)
Thank you for the suggestion @Deleetdk . For now I added an argument na.action to bic.glm.formula which is set to na.omit by default. Will do the same for the data.frame and matrix classes.