handson-ml2
handson-ml2 copied to clipboard
[Clarification Chapter 3] FutureWarning: elementwise comparison failed; returning scalar instead
This is a warning I got while running the Binary Classifier (5 Detector) code from Chapter 3 specifically when I was creating the subset of the dataset with only 5's on the train and test set.
y_train_5 = (y_train == 5)
y_test_5 = (y_test == 5)
This error forbids me from running the SGDClassifier in the next code block of the book/jupyter notebook since y is not 1D array.
Also realized that the same error is still open as an issue on numpy and pandas repositories.
I'm using the versions mentioned in the readme of this repository.
Any help regarding this is appreciated. If a similar issue exists, please leave a comment and I'll close this.
Thanks!
Hi Vedanthv,
I noticed this too. The problem seems to occur because "y_train" is created as type "object." Then the condition "y_train == 5" checks whether these objects are equivalent to integers - which they aren't, so every element returns False. Here, we can see that the first element ought to be True.
>>> X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]
>>> y_train
array(['5', '0', '4', ..., '5', '6', '8'], dtype=object)
>>> y_train == 5
array([False, False, False, ..., False, False, False])
My solution was to cast y_train as type integer, and reshape it in the following step (SGDClassifier expects a 2D array in the correct shape).
>>> X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]
>>> y_train = y_train.astype(int)
>>> y_train
array([5, 0, 4, ..., 5, 6, 8])
>>> y_train == 5
array([ True, False, False, ..., True, False, False])
>>> sgd_clf = SGDClassifier(max_iter=1000, tol=1e-3, random_state=42)
>>> sgd_clf.fit(X_train, y_train_5)
>>> sgd_clf.predict(some_digit.reshape(1,-1))
array([ True])
Hi Ian, Thanks for the clarification! This fixed the problem
Thanks for your question @vedanthv , and thanks for the solution @ian-coccimiglio !
It's indeed important to cast the labels to integers. The books includes this line at the bottom of page 86: y = y.astype(np.uint8)
.
Also, since the book was published, fetch_openml()
changed: it used to return NumPy arrays, but now it returns Pandas DataFrames. This breaks some of the code in the notebooks. Luckily there's an easy fix: just set as_frame=False
when calling fetch_openml()
and everything should work fine.
Btw, the third edition of the book will come out in October 2022, and the updated notebooks are available at https://github.com/ageron/handson-ml3
Hope this helps!