interpret icon indicating copy to clipboard operation
interpret copied to clipboard

Is it possible to extract the FAST results

Open shneezers opened this issue 5 years ago • 3 comments

Hi,

This is more of a question, rather than an issue, but I was wondering if it was possible to see the FAST results for all the interactions that are tested. I figure that with the ability to see the FAST results, I can determine the optimal number interactions I should include in the model itself.

I was thinking a temporary work around would just be to loop through a range of values which are used to set the number of interaction terms when creating the model instance and train many models that way such as below:

top_inter = {}
for i in range(1,11,1):
    ebm = ExplainableBoostingClassifier(interactions=i,random_state=12345)
    ebm.fit(X_train,y_train)
    top_inter[i] = ebm.inter_indices_

This method might become a bit cumbersome and time consuming.

Very impressed by the work that you guys are doing in this project!

Thanks!

shneezers avatar Feb 13 '20 16:02 shneezers

Hi @shneezers,

Thanks for bringing this up! With the way the code is written now, it's not easy to extract the computed FAST scores. FAST currently calculates scores for each possible interaction (N2 - N terms), and does so once per every outer bag. Because there are so many terms calculated, we end up throwing away all but the top "k" terms, where k is the number of interactions specified by the user (see here: https://github.com/interpretml/interpret/blob/544644faef91f92355c03295f3f220a2bbc50515/python/interpret-core/interpret/glassbox/ebm/internal.py#L1063)

Each final EBM is an ensemble of bagged "inner EBMs" (defaulted to 16), each of which has estimated their own FAST scores across all interactions, which will be different per bag. To merge these into the final pairwise interactions used in the model, we do a stagewise selection procedure (see here: https://github.com/interpretml/interpret/blob/544644faef91f92355c03295f3f220a2bbc50515/python/interpret-core/interpret/glassbox/ebm/ebm.py#L983).

All this is to say that ultimately, for a default EBM run, we end up with a total of 16 * (N2 - N) FAST scores. Because this is quite a bit to store directly in the model, we ended up calculating these on the fly, and discarding scores/pairs we didn't keep along the way.

We'd still like to directly expose FAST scores to users, so we've come up with a few ideas:

  • Expose FAST directly as a function, so users can do this calculation on their own (either before or after fitting an EBM with main effects).

  • Add parameters to EBM which would enable people to optionally keep FAST scores. These would be defaulted to OFF, but interested users can opt-in to keeping them. We might expose one to store each of the inner EBMs, and then another to have each of those EBMs keep their FAST scores. We could do this for all interaction terms, or just the user specified top K.

  • Store results of the backward/foreward stagewise selection procedure, which would be a longer ranked list of significant interaction terms.

Any thoughts on which approach you'd prefer, or perhaps multiple? We're really happy to see interest in FAST and definitely want to make this easier to use.

-InterpretML Team

interpret-ml avatar Feb 20 '20 17:02 interpret-ml

Hi @interpret-ml

So if the feature_step_n_inner_bags parameter is set to some number > 1, do each of those inner bags have their own FAST scores?

In regards to the approach to exposing the FAST scores to the end user, in the first idea (Expose FAST directly as a function), by "main effects" are you referring to the individual features? If so, instead of having EBM being fitted, can it be any other type of GAM, such as a linear/logistic regression that can be used to calculate the FAST scores.

Also, with the backward/forward stagewise selection approach, this is how the final interactions are selected across all the outer bags right? I'll explain further, each of the 16 outer bag (aka "inner EBM") has the possibility of having different top K interactions as the FAST calculation is different for each bag and the final EBM's top K interactions are determined by the largest average difference in MSE or LogLoss, depending on the model type, calculated by the forward and backward selection step. If my understanding is correct, then I'd be interested in this option as well.

From my understanding of how the original GA2M algorithm worked, as described in the Accurate Intelligible Models with Pairwise Interactions paper, I thought that the optimal number of interactions were calculated within the model and rather than being set prior to training the model. If this is correct, is there a possibility of adding this in the future? Or if the 3rd approach is taken, is it possible to return double the results for double the user specified top K interactions. In this sense, as the end user, I can arbitrarily select how many interactions I think should be in the model, but also check if I should've added some amount more?

For the 2nd approach, I don't see if this would be more advantageous compared to the 1st. In my opinion, from this result, I would end up doing something similar to the backward/forward selection step that is already being done in the model, so it would end up being redundant.

If I was confusing in any part, please let me know. Looking forward to your thoughts as well!

Thanks!

ghost avatar Feb 20 '20 21:02 ghost

Hello!

Have the features mentioned here been implemented in some fashion? I am looking to transfer FAST to NAMs, and the first and third solutions listed would be the most helpful.

Thank you!

Hi @shneezers,

Thanks for bringing this up! With the way the code is written now, it's not easy to extract the computed FAST scores. FAST currently calculates scores for each possible interaction (N2 - N terms), and does so once per every outer bag. Because there are so many terms calculated, we end up throwing away all but the top "k" terms, where k is the number of interactions specified by the user (see here:

https://github.com/interpretml/interpret/blob/544644faef91f92355c03295f3f220a2bbc50515/python/interpret-core/interpret/glassbox/ebm/internal.py#L1063 )

Each final EBM is an ensemble of bagged "inner EBMs" (defaulted to 16), each of which has estimated their own FAST scores across all interactions, which will be different per bag. To merge these into the final pairwise interactions used in the model, we do a stagewise selection procedure (see here:

https://github.com/interpretml/interpret/blob/544644faef91f92355c03295f3f220a2bbc50515/python/interpret-core/interpret/glassbox/ebm/ebm.py#L983 ).

All this is to say that ultimately, for a default EBM run, we end up with a total of 16 * (N2 - N) FAST scores. Because this is quite a bit to store directly in the model, we ended up calculating these on the fly, and discarding scores/pairs we didn't keep along the way.

We'd still like to directly expose FAST scores to users, so we've come up with a few ideas:

* Expose FAST directly as a function, so users can do this calculation on their own (either before or after fitting an EBM with main effects).

* Add parameters to EBM which would enable people to optionally keep FAST scores. These would be defaulted to OFF, but interested users can opt-in to keeping them. We might expose one to store each of the inner EBMs, and then another to have each of those EBMs keep their FAST scores. We could do this for all interaction terms, or just the user specified top K.

* Store results of the backward/foreward stagewise selection procedure, which would be a longer ranked list of significant interaction terms.

Any thoughts on which approach you'd prefer, or perhaps multiple? We're really happy to see interest in FAST and definitely want to make this easier to use.

-InterpretML Team

BullAJ avatar Mar 29 '22 20:03 BullAJ

Hi @shneezers & @BullAJ --

We have exposed FAST as a separate utility function that can be called independently of the EBM construction, and yes, you can use it in conjunction with a linear model or another kind of GAM. More details in issue https://github.com/interpretml/interpret/issues/332

paulbkoch avatar Oct 19 '22 23:10 paulbkoch