Paul Koch comments

Results 262 comments of


                                            Paul Koch

Binning of data

Hi @piyushnegi97 -- Here are some answers to your questions: 1. As you mentioned, hyperparameter tuning is not really required. It probably helps a bit. I'm not aware of any...

Binning of data

Hi @piyushnegi97 -- PDP doesn't quite work because there are almost always correlations between features which would lead to counting some of the effect twice if you were to use...

Yes, yes, and I'm not sure (I didn't write this part), but the InterpretML implementation seems to be based on the Friedman paper: https://interpret.ml/docs/pdp.html#friedman2001greedy-pdp @Harsha-Nori or @nopdive would know more...

Binning of data

1) We bag the dataset and generate models on each of those bags. This is the outer bagging. The error bounds on the graphs are the standard deviations of the...

Binning of data

The trees have an ephemeral lifespan and exist entirely in C++. Here's the priority queue loop where the trees are generated (this section is just for the mains): https://github.com/interpretml/interpret/blob/262d698d6346e20227971ebba8126b1bf26211d4/shared/libebm/PartitionOneDimensionalBoosting.cpp#L620-L690 Immediately...

EBM Classifier Global Feature Importance x Random Forest Classifier with Morris Sensitivity Analysis

Hi @gatihe -- The models tend to "think" differently and if the performances are similar it would be difficult to choose which model is a better representation of the underlying...

Integrate EBM into the pytorch framework

Hi @JWKKWJ123 -- This kind of federated learning approach isn't something that we support out of the box. You can kind of hack it as you've discovered using merge_ebms, but...

Support for Explainable Boosting Machine

Our docs have a simplified replica of the predict function that might be useful. https://interpret.ml/docs/ebm-internals-regression.html

Question: Parallel boosting?

I agree with your assessment @DerWeh that we should be able to parallelize boosting on multiple features at a time. I think there is a reasonable limit though where if...

Question: Parallel boosting?

And we already have the concept of data sub-sets in C++ to prepare for this eventuality: https://github.com/interpretml/interpret/blob/9cbb353e8cec27368d9d9a091c16cf7be16a469e/shared/libebm/DataSetBoosting.hpp#L26 And we can then merge the histograms from the subsets after constructing them...