zoon icon indicating copy to clipboard operation
zoon copied to clipboard

Add ENMeval crossvalidation functionality

Open goldingn opened this issue 8 years ago • 8 comments

The package Emiel mentioned... as a process module probably

goldingn avatar Sep 11 '15 15:09 goldingn

Agree it should be process. Crossvalid is a process etc.

timcdlucas avatar Sep 12 '15 08:09 timcdlucas

So this package does some cross validation stuff as well as modelling and evaluation. I assume here we are only talking about the cross validation stuff.

@goldingn What happens with methods like leave one out? In our current implementation a record can only be assigned to one fold and so we only test against the one record left out not jack-knife across them all. I had a look though our modules and couldn't see one that handles this. Crossvalidate just assigns each obs to a single group...

AugustT avatar Feb 18 '16 17:02 AugustT

For LOO, wouldn't we just assign each datapoint to a separate fold? We train on all the data points not in the fold. Would need an output module to average the results.

My muddled early-morning brain must be missing something though.

goldingn avatar Feb 18 '16 18:02 goldingn

Yeah I don't know why you would jack knife with LOO? I guess it would be good to have a LOO option rather than having to count your own data and put k as that.

But jack knifing in general is a good point that doesn't fit at all at the moment. 5 fold CV but done 10 times is quite common.

timcdlucas avatar Feb 18 '16 19:02 timcdlucas

5 fold CV but done 10 times is quite common.

You could achieve that with a list of cross-validation modules I guess, though they wouldn't get summarized together in an output module that way. I suspect this is one to leave and wait to see if anyone requests the functionality.

Re. the LOO bit, I agree that a LOO process module would be handy.

goldingn avatar Feb 19 '16 00:02 goldingn

Agree about leaving it until someone asks for it. I don't actually think I've seen an SDM paper do it, just know it's an option in Caret.

I think where possible it'd be better to have LOO as an argument to other crossvalidation modules. Just to keep things tidy.

Crossvalid(k = 'loo') for example. Or Crossvalid(k = 'n').

timcdlucas avatar Feb 19 '16 08:02 timcdlucas

Okay LOO was a bad example.

Consider partion-disc. I want to run this 100 times, each time a random disc is used to cookie cut my data into training and testing. This allows leave-one-disc out cross-validation sperrorest::partition.disc. However my points now belong to multiple folds since a single point may appear in more than one of the discs to be left out. Does this make sense?

I will take your advice as to whether you think this is a common problem. Perhaps not following your comments.

AugustT avatar Feb 19 '16 09:02 AugustT

Yes it does make sense and it's probably more relevant in the partition.disc example than bootstrapped CV.

I guess the solution is multiple fold columns or something but that's going to be a lot of work (#53 etc.)

For now, can we force partition disc to not do that? I haven't used it so don't know much about it.

timcdlucas avatar Feb 19 '16 09:02 timcdlucas