ibis.iSDM
ibis.iSDM copied to clipboard
Implement spatial block-validation in `ibis.iSDM` ?
(Spatial) Block validation has so far not been added to the package given the complexities of assigning blocks to single or multiple datasets that might be specified in the model. Thus in most projects we usually implement the cross-validation externally, e.g. providing subsets of training data to individual ibis fits and then validate them externally. I still think outsourcing validation to the user makes the most sense.
However...., given that increasingly we have a range of projects that need to rely on this, we could brainstorm on how to best support this functionality within ibis. I judge this as a relatively big overhaul if implemented well.
So possible implementation steps:
- [ ] I would suggest an implementation using the new spatialsample, which is relatively clean and aligns with the tidy philosophy.
- [ ] Idea would be to have a new function called
cross_validate()(or another name?) opposed to justvalidate(). This function would need to store the method and blocks somehow in theBiodiversityDistribution-classobject so that it can be queried from within the object. - [ ] During the setup and training stage for each engine, there could be a query for these attributes to create the sets, run per set and store the validation statistics. Note that some engines support internal cross-validation (XGBoost for example).
- [ ] The metric to be used for cross-validation would need to be saved in the
BiodiversityDistribution-classobject and also in the resulting object with the fits. - [ ] (Optional) functionality to not only make a cross-validation but also an ensemble of the various distributions fitted within the object.
- [ ] The whole pipeline requires several unittests and likely its own vignette article ("Cross-validation and ensemble modelling") as well to demonstrate the procedure.
Thoughts?
This has been dormant since a while and as we implement this already in a range of processing pipelines, I am still in favour of not adding it directly I think.
Instead provide an extra vignette highlighting (spatial) cross-validation approaches might be an idea. The vignette could simply import and use the spatialsample package