skranger
skranger copied to clipboard
Call `predict_quantile` by default if `quantiles` is set to `True`, `list` or `np.ndarray`
When using RangerForestRegressor with quantiles=True in a parameter optimization software (i.e. tune-sklearn) in order to optimize probabilistic metrics like Continuous Ranked Probability Score (CRPS), it is required the model ot output the 2D tensor corresponding to the predict_quantiles method. However, when making CRPS a score metric with the sklearn API with make_score function, in a final step, it will call (always) the Ranger's predict method, so it is never going to predict quantiles in any way.
Here is a brief example of what I am trying to explain:
from sklearn.metrics import make_scorer
from skranger.ensemble import RangerForestRegressor
from tune_sklearn import TuneSearchCV
from solarforecastarbiter.metrics.probabilistic import continuous_ranked_probability_score as crps
param_dists = {
'max_depth': (0, 50),
'min_node_size': (10, 100),
'n_estimators': (100, 1000),
'split_rule': ['variance', 'extratrees', 'maxstat'],
}
m = RangerForestRegressor(quantiles=True)
gs = TuneSearchCV(m,
param_distributions=param_dists,
scoring=make_scorer(crps, greater_is_better=False),
)
gs.fit(X, y) # Raise error: forecasts must be 2D arrays
I think the sklearn API is correct. To surpass this problem, I made some chages in skranger:
- First, I initialize
RangerForestRegressorwithquantiles: Union[bool, list, np.ndarray]. Ifquantilesreceives any variable of that type, it will be in quantile mode. - Second, if the model is in quantile mode, then it will call
predict_quantileby default when predicting.
NOTE: Additional logic should be implemented if a non-quantile prediction is required and quantile mode is enabled.
I see what you're doing and it makes sense. I'm wondering if we should just break out the quantile regression to a separate estimator. Does that make sense to do here?
FWIW R's grf does this and I followed this pattern when writing skgrf.
Looks like builds are broken due to this bug in setuptools. Looks like a fix is in progress. https://github.com/pypa/setuptools/issues/3002
I'm wondering if we should just break out the quantile regression to a separate estimator. Does that make sense to do here?
Well, I wouldn't know what would be better, I think you know better the global structure of the project.