pdp
pdp copied to clipboard
Add a sample feature for ICE/c-ICE/d-ICE curves
For example, to plot a random (sub)sample of curves
partial(fit, pred.var = "x3", ice = TRUE, frac = 0.5, plot = TRUE)
This would be easiest to accomplish before converting to long format; for example
if (frac < 1) {
pd.df <- pd.df[sample(nrow(pd.df), size = floor(frac*nrow(pd.df)), replace = FALSE), ]
}
This is exactly what I came here to ask about! I assume this feature isn't yet implemented? Until the feature is implemented, what is the "right" way to go about hacking this together?
I don't want to just restrict the sample of curves plotted. Is there a way to restrict the sample of curves computed (in addition to plotted), so as to reduce computation time. My dataset has 2.5 million observations, so even with parallel = TRUE, it's taking hours to compute and plot a single feature.
I was thinking of just feeding my random forest model a random subset of the data, and inputting that into the partial
command, but I'm worried this is not correct.
Hey @DeFilippis. the easiest way to accomplish this right now is to provide a sampled version of the original training data via the ‘train’ argument in partial. Fit your model on the full training set though! I can provide a simple example later on if you need!
Now that vip has been updated on cran, I’ve started to work on pdp so hopefully these features will be available in the next release!
Perfect -- that's really easy. I'm using this in case it helps anybody:
partial(model, pred.var = "predictor", ice = TRUE, center = TRUE, plot = TRUE, plot.engine =
"ggplot2", parallel = TRUE, paropts = list(.packages = "ranger"), train = sample_frac(data, .5)))
sample_frac
from tidyverse
That should do it! I’ll be sure to include this feature in the next release, so hopefully soon! Same with the squash function as well!