skope-rules icon indicating copy to clipboard operation
skope-rules copied to clipboard

The oob score

Open wjj5881005 opened this issue 3 years ago • 0 comments

I think the oob score computed in the fit function is wrong.

The authors get the oob sample indices by "mask = ~samples", and then apply X[mask, :] to get the oob samples. Actually, I test the case and found that there are many same elements between samples and X[mask,:], and the length of training samples and mask samples are the same. For example, if we totally have 100 samples, when 80 samples are used to train the model, then the length of oob samples should be 100-80=20 (without considering replacement).

I also turn to the implementation of sampling oob of randomforest, and I found following codes:

random_instance = check_random_state(random_state) sample_indices = random_instance.randint(0, samples, max_samples) # get the indices of training samples sample_counts = np.bincount(sample_indices, minlength=len(samples)) unsampled_mask = sample_counts == 0 indices_range = np.arange(len(samples)) unsampled_indices = indices_range[unsampled_mask] # get the indices of oob samples

then the unsampled_indices is the truely oob sample indices.

wjj5881005 avatar Jun 08 '21 06:06 wjj5881005