introduction_to_ml_with_python
introduction_to_ml_with_python copied to clipboard
Show how features derived from kmeans seperate the two half-moon
In notebook 03-unsupervised-learning
X, y = make_moons(n_samples=200, noise=0.05, random_state=0)
kmeans = KMeans(n_clusters=10, random_state=0)
kmeans.fit(X)
y_pred = kmeans.predict(X)
plt.scatter(X[:, 0], X[:, 1], c=y_pred, s=60, cmap='Paired')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=60,
marker='^', c=range(kmeans.n_clusters), linewidth=2, cmap='Paired')
plt.xlabel("Feature 0")
plt.ylabel("Feature 1")
print("Cluster memberships:\n{}".format(y_pred))
The book only provides the transformed feature and claims that we can now separate the two half-moon with linear models
distance_features = kmeans.transform(X)
print("Distance feature shape: {}".format(distance_features.shape))
print("Distance features:\n{}".format(distance_features))
Maybe it's better to demonstrate how features derived from kmeans separate the two half-moon, see e.g.,
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression().fit(distance_features, y)
xx = np.linspace(X[:, 0].min() - 0.5, X[:, 0].max() + 0.5, 100)
yy = np.linspace(X[:, 1].min() - 0.5, X[:, 1].max() + 0.5, 100)
XX, YY = np.meshgrid(xx, yy)
X_grid = np.c_[XX.ravel(), YY.ravel()]
X_grid_kmeans = kmeans.transform(X_grid)
decision_values = clf.decision_function(X_grid_kmeans)
plt.figure()
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='Paired')
plt.contour(XX, YY, decision_values.reshape(XX.shape), levels=[0])
plt.show()

Thanks, that might indeed be a useful addition for the next print.