modAL
modAL copied to clipboard
Fitting classifiers with bootstrapping on small datasets with few classes risks having only one class in dataset
As the title says. For demonstrative purposes, say I have a committee with 50 learners and 2 data points of class A and B and I want to fit them with bootstrapping (for some reason). Then I will likely get an exception from sklearn that a classifier only has one class in its data.
A possible fix would be to ensure that at least one sample from each class is present in the bootstrapped data.
def get_bootstrap_idx(y_training):
n_instances = y_training.shape[0]
bootstrap_idx = np.array([], dtype=int)
classes = np.unique(y_training)
for y in classes:
idx = np.where(y_training == y)[0]
bootstrap_idx = np.append(bootstrap_idx, np.random.choice(idx, 1))
bootstrap_idx = np.append(bootstrap_idx, np.random.choice(range(n_instances), n_instances - len(classes), replace=True))
return bootstrap_idx
I am not sure what would be a proper solution here. Forcing bootstrapping to always contain at least two classes is kind of an artificial solution. Thinking about what to do, will return soon!