iml
iml copied to clipboard
Question: Why zeroth and first intervals are grouped together in calculate.ale.num?
Hi @christophM, I am trying to implement a simpler function to calculate ALE and allow me to resample (bootstrap or jackknife) and construct some confidence interval around the estimated ALE.
After a two days failing to understand the reasoning for grouping the first two intervals together in line 18 following, I hope it is OK to ask you here for the rationale behind it.
18: interval.index[interval.index == 0] <- 1
With that, the first two groups are twice as big as the others (given a somewhat uniformly distributed variable) and I was getting an up tick at the very start of the distribution, as in the following image, across several variables and was worried that this could be some sort of side effect of this bigger grouping.
Maybe related: in line 38, 0 is prepended in the cumulated sum vector. Is that relevand or is it just to calculate middle point for each interval, as indicated in the comment?
I appreciate your time and any comment on this matter. (And really appreciate your book as well!)
The rest of the function:
https://github.com/christophM/iml/blob/2d18ff4ad87769da8b24a364127181447b5d3791/R/FeatureEffect-ale.R#L6-L60
So, I should note that I started following the implementation of ALE and the same effect happens in line 83 of ALEPlot.R, when calling cut()
with include.lowest=TRUE
. First I though it was a bug, but then I saw that in your implementation the assignment of the zeroth group to the first is intentional. The plots I showed above are rather based on ALEPlot's implementation and can differ from your implementation. Please let me know if you consider it out-of-topic here and I can try to contact ALEPlot's author directly.