Pyspatialml icon indicating copy to clipboard operation
Pyspatialml copied to clipboard

addition to work with NGBoost

Open RichardScottOZ opened this issue 4 years ago • 9 comments

Hi Steven,

FYI, did this last year to use your work with NGBoost, finally got around to updating.

RichardScottOZ avatar Feb 10 '21 06:02 RichardScottOZ

Thanks for the contribution Richard - a prediction method that can output distribution will be useful, I'll merge this in the next few days.

stevenpawley avatar Feb 11 '21 21:02 stevenpawley

Yes, haven't put any sensible relevant comments/doc bits on it, as it was literally just do this so I could output.

RichardScottOZ avatar Feb 12 '21 01:02 RichardScottOZ

I saw a prediction module? Yesterday I was working on a hack for using hdbscan...are the functions in raster.py going to migrate..or version with other uses?

RichardScottOZ avatar Mar 08 '21 06:03 RichardScottOZ

Hi Richard,

I'm just working on a couple of problems relating to the in-memory files feature that I added to Pyspatialml, but I'd like to return to this. NGBoost looks like it uses a predict_dist method. Do you know if this works within scikit learn's structures, e.g. it can function inside a pipeline etc?

Scikit learn doesn't appear to support prediction intervals very uniformly/extensively. GradientBoostingRegressor enables prediction intervals via quantile predictions, but it does this without a new method, by setting or modifying the 'alpha' parameter of the estimator in-place, and then using the regular predict function for the specified quantile.

My favourite R random forest implementation, ranger, which there is also a Python wrapper around the C++ libs, also allows quantile prediction, but in Python it uses a predict_quantile method to perform this, so a different approach again, and so I don't think quantile predictions can be made easily if the estimator is encapsulated within another structure like a Pipeline.

stevenpawley avatar Mar 08 '21 16:03 stevenpawley

I haven't tried it, but I would guess probably? Only thing I think I remember seeing is a grid search mentioned there.

RichardScottOZ avatar Mar 08 '21 21:03 RichardScottOZ

I was wondering about that a little when I saw your apply function - e.g. if needed StandardScaler raster stack based on the original for clustering - a function and argument dictionary with the array, anything else?

RichardScottOZ avatar Mar 08 '21 21:03 RichardScottOZ

Yes, was wondering the same thing, if the apply method could be used for applying predictions with arbitrary/non-standard methods. I think it can, but I should work through it with an example because I'd still like to use NGBoost or skranger for prediction intervals, but when I tried with skranger it wouldn't work if wrapped inside pipelines or other methods because they don't have a predict_quantiles method to pass through.

stevenpawley avatar Mar 08 '21 23:03 stevenpawley

Yes, so possibly might need some sort of overloading custom pipeline hackery in that case, which isn't ideal.

RichardScottOZ avatar Mar 08 '21 23:03 RichardScottOZ

and hdbscan class label estimation looks like this, basically

result, result_strengths_t = hdbscan.approximate_predict(estimator, flat_pixels) (so 2 to do)

and there is #result = estimator.predict_proba(flat_pixels) result = hdbscan.prediction.membership_vector(estimator, flat_pixels) - which gives the probabilities of being in any particular cluster

RichardScottOZ avatar Mar 08 '21 23:03 RichardScottOZ