handson-ml2 [Chapter 7] Voting classifier achieves lower accuracy than SVC

Hello,

I am trying the first code sample of Chapter 7 which is supposed to demonstrate an accuracy improvement of a Voting Classifier compared to the individual estimators. However, I am not able to achieve the scores shown in the book. In fact, the standalone SVC outperforms the Voting classifier:

LogisticRegression 0.8304
RandomForestClassifier 0.8324
SVC 0.862
VotingClassifier 0.858

It's not better with soft voting either:

LogisticRegression 0.8304
RandomForestClassifier 0.83
SVC 0.862
VotingClassifier 0.856

Is this to be expected? I played around a bit when creating the moons dataset but couldn't make any difference. What are the exact parameters of the dataset used in the book? (n_samples, noise)

Thank you in advance!

Jun 10 '21 07:06 hahampis

mnist = fetch_openml('mnist_784', version=1, cache=True, as_frame=True)

X = mnist.data
y = mnist.target.astype(np.uint8)

X_train_val, X_test, y_train_val, y_test = train_test_split(X, y, test_size=10000)
X_train, X_val, y_train, y_val = train_test_split(X_train_val, y_train_val, test_size=10000)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
X_val = scaler.transform(X_val)

rnd_clf = RandomForestClassifier(n_estimators=100, n_jobs=-1)
ext_clf = ExtraTreesClassifier(n_jobs=-1, n_estimators=100)
linsvc_clf = LinearSVC(max_iter=100, tol=20)

for clf in [rnd_clf, ext_clf, linsvc_clf]:
  clf.fit(X_train, y_train)
  y_val_pred = clf.predict(X_val)
  print(clf.__class__.__name__, accuracy_score(y_val, y_val_pred), f1_score(y_val, y_val_pred, average=None))

named_estimators = [
                    ("Random_Forest", rnd_clf),
                    ("Extra_Trees", ext_clf),
                    ("Linear_SVC", linsvc_clf),
]

voting_clf = VotingClassifier(named_estimators, n_jobs=-1)
voting_clf.fit(X_train, y_train)
y_val_pred = voting_clf.predict(X_val)
print("Voting Classifier", accuracy_score(y_val, y_val_pred), f1_score(y_val, y_val_pred, average=None))

voting_clf.estimators

voting_clf.set_params(Linear_SVC=None)

voting_clf.estimators_

del voting_clf.estimators_[2]

y_val_pred = voting_clf.predict(X_val)
print("Voting Classifier", accuracy_score(y_val, y_val_pred), f1_score(y_val, y_val_pred, average=None))

voting_clf.voting='soft'
y_val_pred = voting_clf.predict(X_val)
print("Voting Classifier", accuracy_score(y_val, y_val_pred), f1_score(y_val, y_val_pred, average=None))

[estimator.score(X_test, y_test) for estimator in voting_clf.estimators_], voting_clf.score(X_test, y_test)

fig, (ax1, ax2) = plt.subplots(1,2, figsize=(20,10))
ax1.imshow(rnd_clf.feature_importances_.reshape(28, 28), cmap='binary')
ax2.imshow(ext_clf.feature_importances_.reshape(28, 28), cmap='binary')

Try running this on your jupyter and comment back if you still run into any problems. Run the code in different cells accordingly.

Jul 01 '21 14:07 Ayazzia01

Hi @hahampis ,

Thanks for your feedback. I just ran the Colab notebook, and I got these results:

It's different from the results in the book:

LogisticRegression 0.864
RandomForestClassifier 0.872
SVC 0.888
VotingClassifier 0.896

Indeed, it's proven extremely difficult (well, impossible really) to ensure that the notebooks keep producing the exact same output over time:

The main reason is that algorithms get tweaked between Scikit-Learn versions. Bugs get fixed. Performance is improved but it comes with small changes here and there which slightly affect the output (e.g., due to floating point errors, the simple fact of computing sums in a different order can change the result, for example 1 + 1 + 1/3 is not exactly equal to 1/3 + 1 + 1). This means that outputs often change when you upgrade Scikit-Learn (or other libraries). And sometimes some function arguments are changed to use a different default value. Etc.
And the datasets can also change: for example, the first edition of my book used fetch_mldata() to download the MNIST dataset, but this relied on mldata.org, which was closed, so I had to use fetch_openml() instead: it's the same dataset, but in a different order, so it changes the results.
And other things may change the outputs. For example, different Python versions or different platforms may use different random number generators. Underlying C libraries may change as well, e.g., to use a multithreaded implementation to gain performance, and this can change the order of execution of the operations, leading to different outputs.

In short, unless you're using the exact same environment as I did when writing the book, you will probably get a slightly different result. In general, it's not a big deal if you get (say) 89.6% accuracy instead of 88.8%, but in this case it does show that the voting classifier will not always be better than the individual predictors.

Instead of trying to have the exact same environment as I used, if you really want to see an example where the voting classifier wins, you can try tweaking the hyperparameters, or simply changing the random_state arguments. For example, when I set random_state=43 in the Colab notebook, I get this:

LogisticRegression 0.864
RandomForestClassifier 0.896
SVC 0.896
VotingClassifier 0.896

The voting classifier is not better than the SVC or the RandomForestClassifier.

But when I use random_state=44, I get this:

LogisticRegression 0.864
RandomForestClassifier 0.888
SVC 0.896
VotingClassifier 0.912

Of course, many of these algorithms are stochastic, so getting slightly different results at each execution is to be expected when changing the random seeds.

I hope this helps!

Jul 06 '21 03:07 ageron

@ageron Your answer is much appreciated, thank you! I obviously didn't expect to get the exact same output due to all the reasons you mentioned. My concern was that I couldn't get the Voting Classifier to win, which is the main idea. I guess it can happen... I'll try to change some more things around (like the seed) to see if I can get it to outperform the rest of the classifiers!

Jul 06 '21 06:07 hahampis

handson-ml2 handson-ml2 copied to clipboard

[Chapter 7] Voting classifier achieves lower accuracy than SVC

handson-ml2
handson-ml2 copied to clipboard