yellowbrick icon indicating copy to clipboard operation
yellowbrick copied to clipboard

WIP: Allows Silhouette Visualizer to accept DensityEstimator

Open lwgray opened this issue 2 years ago • 2 comments

This PR fixes #1303, which reported that they could not use GMM as a clustering model with Silhouette Visualizer. They received this traceback: yellowbrick.exceptions.YellowbrickTypeError: The supplied model is not a clustering estimator; try a classifier or regression score visualizer instead!

Once I resolved the above issue, I encountered another problem with GMM not having a n_clusters attribute on the estimator.

I have made the following changes:

  1. I added a new is_density function to the utils/types file
  2. I then used the is_density function with the ClusteringScoreVisualizer Class to allow for DensityEstimators to be used by this class ~3. I fixed the attribute error by using a try/except clause to setself.n_clusters_ equal to self.estimator.n_components in silhouette.py file~
  3. Checked if self.estimator has the n_components attribute that the Density Estimator possesses and set self.n_clusters_ to self.estimator.n_components

Sample Code

from sklearn.mixture import GaussianMixture as GMM

from yellowbrick.cluster import SilhouetteVisualizer from sklearn.datasets import make_blobs

X, y = make_blobs( n_samples=1000, n_features=12, centers=5, shuffle=False, random_state=0 )

Instantiate the clustering model and visualizer model = GMM(n_components = 5, random_state=0) visualizer = SilhouetteVisualizer(model, colors='yellowbrick')

visualizer.fit(X) # Fit the data to the visualizer visualizer.show() # Finalize and render the figure

PLOT

image

Questions for the @DistrictDataLabs/team-oz-maintainers:

  • [ ] Is the try/except clause a viable solution for missing attributes? I foresee this being an issue because I came across a different attribute error with a different clustering estimator. This could get unwieldy.
  • [ ]

CHECKLIST

lwgray avatar Jun 24 '23 22:06 lwgray

Codecov Report

Merging #1304 (78c8c6a) into develop (f7a8e95) will increase coverage by 0.00%. The diff coverage is 100.00%.

@@           Coverage Diff            @@
##           develop    #1304   +/-   ##
========================================
  Coverage    90.70%   90.71%           
========================================
  Files           93       93           
  Lines         5327     5332    +5     
========================================
+ Hits          4832     4837    +5     
  Misses         495      495           
Files Changed Coverage Δ
yellowbrick/cluster/base.py 100.00% <100.00%> (ø)
yellowbrick/cluster/silhouette.py 85.55% <100.00%> (+0.32%) :arrow_up:
yellowbrick/utils/types.py 92.15% <100.00%> (+0.49%) :arrow_up:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov[bot] avatar Jun 24 '23 22:06 codecov[bot]

@bbengfort Please hold off approving this PR because the fix I added here is already fixed more logically in #1294

lwgray avatar Jun 25 '23 23:06 lwgray