feature_index in plot_decision_regions defaults to (0,1) even if those are filler columns
If no feature_index is given, it defaults to (0,1), even if 0 and 1 are given as 'filler features'. Alternatively, maybe feature_index should default to the non-filler-features in this case? (Or vice versa?).
I know the error thrown in this case is admirably helpful and specific, so maybe this isn't worth bringing up...
Reproduceable code example of issue (if it's considered a real 'issue'):
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt
from mlxtend.plotting import plot_decision_regions
import xgboost as xgb
# Get example data (iris.data has 4 columns)
iris = load_iris()
X = iris.data
y = iris.target
# Fit classifier
clf = xgb.XGBClassifier()
clf.fit(X, y)
# Plot decision boundary region, first two col indices should be ignored
# w/ 'filler' values
arb = 5
filler_feature_values = {0:arb, 1:arb}
fig, ax = plt.subplots()
plot_decision_regions(X, y, clf=clf, filler_feature_values=filler_feature_values, ax=ax)
Result:
ValueError: Column(s) [2 3] need to be accounted for in either feature_index or filler_feature_values
Good point. I don't have a strong preference, here, but I think that auto-assigning remaining columns if filler_feature_values are set would add additional convenience as in the vice versa scenario: auto-assigning filler_feature_values if feature_index is specified.
I am really not sure what the issue is. But, plotting SVM text classification is such a problem with this one (or with the libraries I have been exploring). By default, if there are more than 2, 3 whatever number features there are, there should be some sort of warning with values filled up automatically. Not a blocking error.
Thanks for the note. I think the problem is with coming up with a good filler value. I think it shouldn't be hard-coded. For instance, if someone plots the decision regions on an unstandardized dataset, 0 as the filler value may make sense. However, if the dataset is an unscaled version of Iris, then 0 would be absolute nonsense.
So, we maybe want to use sth like the feature "mean" or "median" as the default filler value, I guess. Median may be the safer bet in case the feature is categorical.