ibis.iSDM Implement spatial block-validation in `ibis.iSDM` ?

Implement spatial block-validation in `ibis.iSDM` ?

Open Martin-Jung opened this issue 1 year ago • 1 comments

(Spatial) Block validation has so far not been added to the package given the complexities of assigning blocks to single or multiple datasets that might be specified in the model. Thus in most projects we usually implement the cross-validation externally, e.g. providing subsets of training data to individual ibis fits and then validate them externally. I still think outsourcing validation to the user makes the most sense.

However...., given that increasingly we have a range of projects that need to rely on this, we could brainstorm on how to best support this functionality within ibis. I judge this as a relatively big overhaul if implemented well.

So possible implementation steps:

[ ] I would suggest an implementation using the new spatialsample, which is relatively clean and aligns with the tidy philosophy.
[ ] Idea would be to have a new function called cross_validate() (or another name?) opposed to just validate(). This function would need to store the method and blocks somehow in the BiodiversityDistribution-class object so that it can be queried from within the object.
[ ] During the setup and training stage for each engine, there could be a query for these attributes to create the sets, run per set and store the validation statistics. Note that some engines support internal cross-validation (XGBoost for example).
[ ] The metric to be used for cross-validation would need to be saved in the BiodiversityDistribution-class object and also in the resulting object with the fits.
[ ] (Optional) functionality to not only make a cross-validation but also an ensemble of the various distributions fitted within the object.
[ ] The whole pipeline requires several unittests and likely its own vignette article ("Cross-validation and ensemble modelling") as well to demonstrate the procedure.

Thoughts?

Jan 29 '24 13:01 Martin-Jung

This has been dormant since a while and as we implement this already in a range of processing pipelines, I am still in favour of not adding it directly I think. Instead provide an extra vignette highlighting (spatial) cross-validation approaches might be an idea. The vignette could simply import and use the spatialsample package

May 31 '24 13:05 Martin-Jung

ibis.iSDM ibis.iSDM copied to clipboard

Implement spatial block-validation in `ibis.iSDM` ?

ibis.iSDM
ibis.iSDM copied to clipboard