skope-rules icon indicating copy to clipboard operation
skope-rules copied to clipboard

Classification vs regression

Open benman1 opened this issue 5 years ago • 2 comments

Hi I think this package looks fantastic. I am wondering, however, what your plans are for implementing SkopeRules for regression. Are there any plans?

I've made a start for adding regression, and I had to make a lot of changes. I made this up as I went through the code really. I had to come up with measures comparable to precision and recall - the precision-like measure is based on the expected reduction in standard deviation; the recall-like measure is based on the z-score of the prediction versus the population of y. At the end, scores are integrated via softmax weighted rules. At the moment, I still get a lot of nans in predictions, because there are not enough rules. The overall mse error is still much worse than a baseline from linear regression.

I've also added comments and a test for regression. This is WIP, but I am happy for anyone to jump in.

Thanks!

benman1 avatar Feb 04 '20 11:02 benman1

After a more testing it seems that for the diabetes dataset that I am using for benchmarking, the linear model actually outperforms the random forest regressor and the decision tree regressor (the latter by a lot); therefore I might have been a bit too strict judging the performance I was getting. I am now getting a performance very similar to both the random forest and linear models, although without rule filtering and without deduplication.

benman1 avatar Feb 05 '20 17:02 benman1

I think the oob score computed in the fit function is wrong.

The authors get the oob samples by "mask = ~samples", and then apply X[mask, :] to get the oob samples. Actually, I test the case and found that there are many same elements between samples and X[mask,:]。

I also turn to the implemtion of oob of randomforest, and I found following codes:

random_instance = check_random_state(random_state) sample_indices = random_instance.randint(0, samples, max_samples) sample_counts = np.bincount(sample_indices, minlength=len(samples)) unsampled_mask = sample_counts == 0 indices_range = np.arange(len(samples)) unsampled_indices = indices_range[unsampled_mask]

then the unsampled_indices is the truely oob sample indices.

wjj5881005 avatar Jun 08 '21 02:06 wjj5881005