rmsb Estimation of negative probabilities with predict

I send a reproducible example showing negative probabilities. This occurs when one of the 'y' levels is rare, and the predictor has an extreme value in its range, so I have been slow to notice the problem.

set.seed(836)
data<- data.frame(HE6=sample(1:10, 200, replace = TRUE, prob=c(rep(0.1,6),0.01,0.002,0.19,0.198) ), 
                  Age = sample(1:85, 200, replace = TRUE), EORTC = sample(1:100, 200, replace = TRUE), 
                  linf=rbinom(200, 1,.5),
                  cir=rbinom(200, 1,.5),esquema=rbinom(200, 1,.5), riesgo=factor(rbinom(200, 2,.5)), estadio=factor(rbinom(200, 2,.5)))
head(data)
table(data$HE6)
dd<-datadist(data)
options(datadist='dd')

bsx <- blrm(HE6  ~ cir*rcs(Age,3)+ linf+ pol(EORTC)+esquema+estadio+riesgo, 
            ~ rcs(Age,3)+ pol(EORTC), cppo=function(y) y, data=data) 

newdata <- data.frame(cir=0, Age=85, EORTC= 10, linf=0, riesgo=0, esquema=1, estadio=1)
predict(bsx, newdata, type='fitted.ind') #

Oct 22 '20 13:10 albertocarm

Thanks for the reproducible example. I tried

for(s in 4 : 10) {
  cat('s:', s, '\n')
f <- blrm(HE6  ~ cir*rcs(Age,3)+ linf+ pol(EORTC)+esquema+estadio+riesgo, 
            ~ rcs(Age,3)+ pol(EORTC), cppo=function(y) y, data=data, seed=s) 

  print(predict(f, newdata, type='fitted.ind'))
}

and could not get any negative point estimates although I saw a few lower uncertainty intervals that were slightly negative.

Oct 22 '20 19:10 harrelfe

That is because it is a subtle issue, difficult to reproduce. It only seems to occur if some levels of the dependent variable are rare, and use extreme values for continuous predictors. To reproduce the error please use set.seed(836) to obtain exactly the same dataset. Notice the low frequency for y= 7 and y = 8. With my exact code you get this output, with an estimated negative probability for these same levels:

> set.seed(836)
> data<- data.frame(HE6=sample(1:10, 200, replace = TRUE, prob=c(rep(0.1,6),0.01,0.002,0.19,0.198) ), 
+                   Age = sample(1:85, 200, replace = TRUE), EORTC = sample(1:100, 200, replace = TRUE), 
+                   linf=rbinom(200, 1,.5),
+                   cir=rbinom(200, 1,.5),esquema=rbinom(200, 1,.5), riesgo=factor(rbinom(200, 2,.5)), estadio=factor(rbinom(200, 2,.5)))
> head(data)
  HE6 Age EORTC linf cir esquema riesgo estadio
1   6  85    42    0   0       1      1       0
2   5  49    49    1   0       0      1       2
3   6  31    58    1   0       0      1       2
4   6  73    81    1   1       1      0       0
5  10  81     6    1   0       0      1       1
6   9  28    28    0   0       0      0       1
> table(data$HE6)

 1  2  3  4  5  6  7  8  9 10 
18 16 22 13 27 22  3  1 46 32 
> dd<-datadist(data)
> options(datadist='dd')
> 
> bsx <- blrm(HE6  ~ cir*rcs(Age,3)+ linf+ pol(EORTC)+esquema+estadio+riesgo, 
+             ~ rcs(Age,3)+ pol(EORTC), cppo=function(y) y, data=data) 
Warning messages:
1: There were 920 divergent transitions after warmup. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them. 
2: Examine the pairs() plot to diagnose sampling problems
 
3: Some Pareto k diagnostic values are slightly high. See help('pareto-k-diagnostic') for details.
 
> #plot(summary (bsx, ycut=5))
> #plot(summary(bsx)) 
> 
> newdata <- data.frame(cir=0, Age=85, EORTC= 10, linf=0, riesgo=0, esquema=1, estadio=1) # aquí está la interfaz con .net
> 
> predict(bsx, newdata, type='fitted.ind') # 
        y x   Mean  Lower Upper
1   HE6=1 1  0.078  0.006 0.189
2   HE6=2 1  0.064  0.009 0.132
3   HE6=3 1  0.088  0.025 0.156
4   HE6=4 1  0.058  0.011 0.107
5   HE6=5 1  0.119  0.048 0.180
6   HE6=6 1  0.101  0.045 0.159
7   HE6=7 1 -0.006 -0.044 0.033
8   HE6=8 1 -0.010 -0.050 0.028
9   HE6=9 1  0.262  0.154 0.366
10 HE6=10 1  0.245  0.052 0.469

Oct 22 '20 20:10 albertocarm

Therefore, when I try to plot Pr (y>= j) I get a zigzag instead of a downward curve

> predict(bsx, newdata, type='fitted') # 
      y x  Mean Lower Upper
1  y>=2 1 0.922 0.811 0.994
2  y>=3 1 0.858 0.690 0.986
3  y>=4 1 0.770 0.549 0.948
4  y>=5 1 0.712 0.476 0.918
5  y>=6 1 0.593 0.318 0.833
6  y>=7 1 0.492 0.227 0.752
7  y>=8 1 0.497 0.221 0.753
8  y>=9 1 0.507 0.241 0.785
9 y>=10 1 0.245 0.052 0.469

Oct 22 '20 20:10 albertocarm

Since ordinality is relaxed with respect to two of the variables, it may be reasonable to expect this to occur (as long as values are just slightly negative) for a few posterior sampling paths. We may want to override negatives with zeros. It would be interested to learn if this ever happens for large sample sizes.

Oct 22 '20 21:10 harrelfe

I have repeated it with 2000 simulated observations. The problem is minor, but it is still happening. Then, for the probability Pr (y >= j) you suggest adding up all the individual probabilities up to j, changing the negatives to zero, but he sum of point probabilities will no longer be 1. And wouldn't a positive prior be better?

Oct 22 '20 21:10 albertocarm