iml
iml copied to clipboard
ALE plots: How does argument `grid.size` effect the results?
Why is the length of the resulting DF per feature so different when setting grid.size = 99
?
I was not able to relate the setting to the actual outcome differences by reading ?FeatureEffect
.
library(iml)
library(rpart)
data("Boston", package = "MASS")
rf = rpart(medv ~ ., data = Boston)
mod = Predictor$new(rf, data = Boston)
# Compute the accumulated local effects for all features
eff = FeatureEffects$new(mod, grid.size = 99)
purrr::map_int(eff$results, nrow)
#> crim zn indus chas nox rm age dis rad tax
#> 100 20 50 2 62 100 92 100 9 45
#> ptratio black lstat
#> 37 77 100
Created on 2020-01-15 by the reprex package (v0.3.0.9001)
Edit: The following code sets the grid
https://github.com/christophM/iml/blob/54b2ce26d8d13f9a6fcd635ee00c8d4835b2cad3/R/FeatureEffect-ale.R#L17-L17
and in more detail this one
https://github.com/christophM/iml/blob/54b2ce26d8d13f9a6fcd635ee00c8d4835b2cad3/R/utils.R#L191-L192
So essentially quantile(type = 1)
is called with probs
being a seq with length.out set by grid.size
.
I wonder if this could make it into the argument description in the help page?
Maybe one could also include the motivation for type = 1
.
The reason for the differing outcomes shown above is then caused by
https://github.com/christophM/iml/blob/54b2ce26d8d13f9a6fcd635ee00c8d4835b2cad3/R/FeatureEffect-ale.R#L16-L17
which removes duplicated values from the quantile()
output.
Regarding interpretation: Does the differing number of unique values for these features introduce a bias when interpreting the ALE plots for the specific predictors? Or is it like "20 is fine, everything greater is better but there is no bias when comparing the ALE plots of these features.".
It's implicitly the max(grid.size, unique(quantiles)) as you described.
I think that this behavior should be fine, since when many values are clustered at certain point, you just need fewer intervals. But I guess it would make sense to add this to the docs.
For the type=1, I am not entirely sure why I set it like this.