zoon
zoon copied to clipboard
Add ENMeval crossvalidation functionality
The package Emiel mentioned... as a process module probably
Agree it should be process. Crossvalid
is a process etc.
So this package does some cross validation stuff as well as modelling and evaluation. I assume here we are only talking about the cross validation stuff.
@goldingn What happens with methods like leave one out? In our current implementation a record can only be assigned to one fold and so we only test against the one record left out not jack-knife across them all. I had a look though our modules and couldn't see one that handles this. Crossvalidate just assigns each obs to a single group...
For LOO, wouldn't we just assign each datapoint to a separate fold? We train on all the data points not in the fold. Would need an output module to average the results.
My muddled early-morning brain must be missing something though.
Yeah I don't know why you would jack knife with LOO? I guess it would be good to have a LOO option rather than having to count your own data and put k
as that.
But jack knifing in general is a good point that doesn't fit at all at the moment. 5 fold CV but done 10 times is quite common.
5 fold CV but done 10 times is quite common.
You could achieve that with a list of cross-validation modules I guess, though they wouldn't get summarized together in an output module that way. I suspect this is one to leave and wait to see if anyone requests the functionality.
Re. the LOO bit, I agree that a LOO process module would be handy.
Agree about leaving it until someone asks for it. I don't actually think I've seen an SDM paper do it, just know it's an option in Caret.
I think where possible it'd be better to have LOO as an argument to other crossvalidation modules. Just to keep things tidy.
Crossvalid(k = 'loo')
for example. Or Crossvalid(k = 'n')
.
Okay LOO was a bad example.
Consider partion-disc. I want to run this 100 times, each time a random disc is used to cookie cut my data into training and testing. This allows leave-one-disc out cross-validation sperrorest::partition.disc
. However my points now belong to multiple folds since a single point may appear in more than one of the discs to be left out. Does this make sense?
I will take your advice as to whether you think this is a common problem. Perhaps not following your comments.
Yes it does make sense and it's probably more relevant in the partition.disc
example than bootstrapped CV.
I guess the solution is multiple fold columns or something but that's going to be a lot of work (#53 etc.)
For now, can we force partition disc to not do that? I haven't used it so don't know much about it.