PlackettLuce icon indicating copy to clipboard operation
PlackettLuce copied to clipboard

Dealing with weak networks

Open kauedesousa opened this issue 5 years ago • 1 comments

Dear Heather,

Here comes an issue that may be related to issue https://github.com/hturner/PlackettLuce/issues/25. But now I think we have a better clue on where is the problem, which arrises mostly when we are performing cross-validations and pltree() is exposed to a set of data with a weak network.

Here is an example

library("PlackettLuce")
source("https://raw.githubusercontent.com/AgrDataSci/ClimMob-analysis/master/R/functions.R")

R <- matrix(c(1, 2, 0, 0, 3,
              4, 1, 0, 0, 2,
              2, 1, 0, 0, 3,
              1, 2, 0, 4, 3,
              2, 1, 0, 3, 4,
              4, 1, 0, 0, 2,
              2, 1, 0, 0, 3,
              1, 2, 0, 1, 3,
              2, 0, 0, 0, 1,
              0, 0, 0, 1, 2), nrow = 10, byrow = TRUE)

colnames(R) <- c("apple", "banana", "orange", "pear", "grape")

R <- as.rankings(R)

# take rows 9 and 10 supposing that it belongs to a different fold in a
# cross-validation
R <- R[-c(9:10), ]

G <- group(R, index = 1:length(R))
p <- data.frame(p = rep(1, length(G)))
dt <- cbind(G, p)

pl <- pltree(G ~ p, data = dt)

# it does not work as shown in issue #25 
predict(pl, newdata = dt)
AIC(pl, newdata = dt)

# but works with vcov = FALSE for predict()
predict(pl, newdata = dt, vcov = FALSE)

# and still dont work for AIC 
AIC(pl, newdata = dt, vcov = FALSE)

# this because orange got off of the network when we sampled the folds
a <- adjacency(R)

plot(network(a))

# the issue still persists even if we increase npseudo 
pl2 <- pltree(G ~ p, data = dt, npseudo = 0.8)


The question is, do you think that this problem can be solved with npseudo (eventually) or should we deal with it by passing vcov = FALSE to the predict() method?

Thanks in advance

kauedesousa avatar Apr 07 '21 12:04 kauedesousa

Thanks for digging down to find the cause of this issue.

The addition of pseudo rankings allows the worth to be estimated, but these pseudo rankings are removed before estimating the variance-covariance matrix. If an item is then completely missing from the rankings this leads to zero rows and columns in the Information matrix which makes it non-invertible, so the variance can't be estimated. I am not sure what the appropriate fix should be here but will follow this up (it may be a few months before I get to it as prioritising work on PLADMM in May/June).

AIC.pltree() doesn't need to compute the variance-covariance matrix, that was throwing an error due to a call to itempar() which defaults to vcov = TRUE. I have replaced this call and made a PR to the master branch; once that's merged in AIC(pl, newdata = dt) should work if you install the package from GitHub. However as newdata is actually the original data used in the fit here, it would be better to simply call AIC(pl) which avoids even more unnecessary computation and should work with the current PlackettLuce release (0.4.0). (This also goes for the call to predict - better not to specify newdata unless you are specifying data that is different from the data used in the fit!)

hturner avatar May 05 '21 15:05 hturner