rpart
rpart copied to clipboard
A couple of questions (issues?) on the vignette `longintro`
On one hand, in the section 4.1, page 13 of the pedf, it is said:
Using the first result, we can uniquely define $T_\alpha$ as the smallest tree $T$ for which $R_\alpha (T) $ is minimized.
My question is if shouldn't be the biggest tree
instead of the smallest
, since, as I understand, it is around a sub tree of the full model
. Am I wrong?
Secondly, just after that the intervals are printed as:
\begin{eqnarray*}
I_1 &=& [0, \alpha_1 ] \\
I_2 &=& ( \alpha_1 , \alpha_2 ] \\
\vdots \\
I_m &=& ( \alpha_{m-1} , \infty]
\end{eqnarray*}
However, brackets seem to be reversed. Shouldn't they be as follows?
\begin{eqnarray*}
I_1 &=& [0, \alpha_1 ) \\
I_2 &=& [ \alpha_1 , \alpha_2 ) \\
\vdots \\
I_m &=& [ \alpha_{m-1} , \infty)
\end{eqnarray*}
Finally, in section 4.3 it is found that
Looking at the table, we see that the best tree has 10 terminal nodes (9 splits), based on cross-validation.
And then it is claimed that
This sub tree is extracted with a call to
prune
and saved infit9
.
However the prune
fit9
extracted has 10 splits (and 11 terminal nodes, as in Figure 4), as it uses cp = 0.2 > 0.022222
, and so, with the notation of the intervals (with my correction[???]), this cp = 0.2
belongs to $I_5 = [0.0166667, 0.0222222)$.
Thanks!