interpret icon indicating copy to clipboard operation
interpret copied to clipboard

Ordinal predictors

Open MassimilianoGrassiDataScience opened this issue 4 years ago • 1 comments

I am not sure I understood how to specify ordinal variables correctly.

In the feature_types argument, if I define the variable type as "ordinal" (e.g., feature_types = ["categorical", "ordinal", "continuous"]) I get the error:

... ~/anaconda3/envs/exp/lib/python3.7/site-packages/interpret/glassbox/ebm/ebm.py in fit(self, X, y) 755 binning=self.binning, 756 ) --> 757 self.preprocessor_.fit(X) 758 X_orig = X 759 X = self.preprocessor_.transform(X_orig)

~/anaconda3/envs/exp/lib/python3.7/site-packages/interpret/glassbox/ebm/ebm.py in fit(self, X) 240 self.hist_counts_[col_idx] = hist_counts 241 elif col_info["type"] == "ordinal": --> 242 mapping = {val: indx + 1 for indx, val in enumerate(col_info["order"])} 243 self.col_mapping_[col_idx] = mapping 244 self.col_bin_counts_.append(None) # TODO count the values in each bin

KeyError: 'order'

If I define the variable type as a nested list with the ordered values (e.g. feature_types = ["categorical", [-1.0, 1.0, 2.0, 3.0, 4.0, 5.0], "continuous"]), I get the error:

... ~/anaconda3/envs/exp/lib/python3.7/site-packages/interpret/glassbox/ebm/ebm.py in train_model(estimator, X, y, X_pair, n_classes) 859 860 def train_model(estimator, X, y, X_pair, n_classes): --> 861 return estimator.fit_parallel(X, y, X_pair, n_classes) 862 863 train_model_args_iter = (

~/anaconda3/envs/exp/lib/python3.7/site-packages/interpret/glassbox/ebm/ebm.py in fit_parallel(self, X, y, X_pair, n_classes) 459 460 # Train main effects --> 461 self._fit_main(main_feature_indices, X_train, y_train, X_val, y_val) 462 463 # Build interaction terms, if required

~/anaconda3/envs/exp/lib/python3.7/site-packages/interpret/glassbox/ebm/ebm.py in _fit_main(self, main_feature_groups, X_train, y_train, X_val, y_val) 496 max_rounds=self.max_rounds, 497 random_state=self.random_state, --> 498 name="Main", 499 ) 500

~/anaconda3/envs/exp/lib/python3.7/site-packages/interpret/glassbox/ebm/internal.py in cyclic_gradient_boost(model_type, n_classes, features_categorical, features_bin_count, feature_groups, X_train, y_train, scores_train, X_val, y_val, scores_val, n_inner_bags, generate_update_options, learning_rate, min_samples_leaf, max_leaves, early_stopping_rounds, early_stopping_tolerance, max_rounds, random_state, name, optional_temp_params) 1266 n_inner_bags, 1267 random_state, -> 1268 optional_temp_params, 1269 ) 1270 ) as native_ebm_booster:

~/anaconda3/envs/exp/lib/python3.7/site-packages/interpret/glassbox/ebm/internal.py in init(self, model_type, n_classes, features_categorical, features_bin_count, feature_groups, X_train, y_train, scores_train, X_val, y_val, scores_val, n_inner_bags, random_state, optional_temp_params) 690 if X_train.shape[0] != len(features_bin_count): # pragma: no cover 691 raise ValueError( --> 692 "X_train does not have the same number of items as the features_bin_count array" 693 ) 694

ValueError: X_train does not have the same number of items as the features_bin_count array

Instead, if I define the second variabile as categorical or continuous it works without errors.

Hi @MassimilianoGrassiDataScience --

Thanks for bringing this up. Ordinal features are currently not supported, but we had some older code left over from testing that you may have run into. We'll strip this out in the next release to keep things less confusing. In the future we plan to re-release ordinal feature support, but for now we recommend sticking with either categorical or continuous.

-InterpretML team

interpret-ml avatar Jan 25 '21 22:01 interpret-ml

We now have full support for ordinals. They can be defined in one of two ways. The first way is to use a CategoricalDType inside a Pandas dataframe while setting 'ordered' to True. The second way is to pass in a list of the categories via feature_types.

Example: feature_types=['continuous', ['high', 'medium', 'low'], 'nominal']. What used to be 'categorical' has been renamed to 'nominal' to differentiate from ordinals.

paulbkoch avatar Jan 26 '23 22:01 paulbkoch