plspm
plspm copied to clipboard
get_boots() and handling of missing values
version: 78810865a29fae600e5518615ba26f2df5c93747
I ran into a bug when using models that include missing values in the input data. The bug only appears when boot.val = T
, so I looked into the get_boots()
function.
To see the bug, you might simply run the toy data example with boot.val = TRUE
:
# let's add missing values to russa
russNA = russa
russNA[1,1] = NA
russNA[4,4] = NA
russNA[6,6] = NA
# PLS-PM using data set 'russa'
rus_pls6 = plspm(russNA, rus_path, rus_blocks, scaling = rus_scaling,
modes = rus_modes, scheme = "centroid", plscomp = c(1,1,1), boot.val = TRUE)
I found out that the problem appears when the cross-loadings are computed. These are obtained with the function cor
:
xloads = cor(X, Y.lvs)
Incidentally, the default behaviour of cor
is not great: it doesn't handle missing values by default, and thus creates rows full of NAs in xloads
whenever a column in X
contains NAs.
Solution:
xloads = cor(X, Y.lvs, use="complete.obs")
Where (in the get_boots.r source file) : both in the initiation of all values (line 54) and in the while loop (line 105).