yellowbrick icon indicating copy to clipboard operation
yellowbrick copied to clipboard

Unable to use Silhouette Visualizer with Gaussian Mixture Model

Open Thecave3 opened this issue 1 year ago • 7 comments

Describe the bug Silhouette score and its visualization can be calculated for Gaussian Mixture Model outputs, while this library currently does not support this.

To Reproduce I used the example code from here and I changed the model from Kmeans to GMM.

# Steps to reproduce the behavior (code snippet):
# Should include imports, dataset loading, and execution
from sklearn.mixture import GaussianMixture as GMM

from yellowbrick.cluster import SilhouetteVisualizer
from yellowbrick.datasets import load_nfl

# Load a clustering dataset
X, y = load_nfl()

# Specify the features to use for clustering
features = ['Rec', 'Yds', 'TD', 'Fmb', 'Ctch_Rate']
X = X.query('Tgt >= 20')[features]

# Instantiate the clustering model and visualizer
model = GMM(5, random_state=42)
visualizer = SilhouetteVisualizer(model, colors='yellowbrick')

visualizer.fit(X)        # Fit the data to the visualizer
visualizer.show()        # Finalize and render the figure

Dataset The dataset chosen does not affect the outcome.

Expected behavior I expect the fitting of the data and the visualization of the scores over the figure.

Traceback

Traceback (most recent call last):
  File "sil_testet.py", line 15, in <module>
    visualizer = SilhouetteVisualizer(model, colors='yellowbrick')
  File "/usr/local/lib/python3.8/dist-packages/yellowbrick/cluster/silhouette.py", line 118, in __init__
    super(SilhouetteVisualizer, self).__init__(estimator, ax=ax, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/yellowbrick/cluster/base.py", line 45, in __init__
    raise YellowbrickTypeError(
yellowbrick.exceptions.YellowbrickTypeError: The supplied model is not a clustering estimator; try a classifier or regression score visualizer instead!

Desktop (please complete the following information):

  • OS: Ubuntu 20.04
  • Python Version 3.8
  • Yellowbrick Version I have no clue on how to retrieve it, I installed it with pip.

Additional context

I believe SilhouetteVisualizer should support GMM due to the possibility of using it as a clustering methodology (e.g., Gaussian Mixture Models Clustering Algorithm Explained).

Thecave3 avatar Apr 16 '23 18:04 Thecave3

I see that this may be also solved by the merging of PR #1294.

Thecave3 avatar Apr 16 '23 18:04 Thecave3

@Thecave3 I was hoping that your issue would be solved by #1294. @lwgray any status on that PR?

bbengfort avatar Jun 12 '23 15:06 bbengfort

@bbengfort I believe that it can solve the issue, however I am not sure about the automatic tests that are preventing the PR to be merged.

Thecave3 avatar Jun 15 '23 08:06 Thecave3

@bbengfort I believe that it can solve the issue, however I am not sure about the automatic tests that are preventing the PR to be merged.

@lwgray any thoughts?

bbengfort avatar Jun 16 '23 12:06 bbengfort

I will create the test this weekend.

Cheers Larry

On Fri, Jun 16, 2023 at 6:32 AM Benjamin Bengfort @.***> wrote:

@bbengfort https://github.com/bbengfort I believe that it can solve the issue, however I am not sure about the automatic tests that are preventing the PR to be merged.

@lwgray https://github.com/lwgray any thoughts?

— Reply to this email directly, view it on GitHub https://github.com/DistrictDataLabs/yellowbrick/issues/1303#issuecomment-1594604474, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHUFNJNDAR3IPTIZPNG4J3XLRG4DANCNFSM6AAAAAAXAISBUM . You are receiving this because you were mentioned.Message ID: @.***>

lwgray avatar Jun 16 '23 13:06 lwgray

  1. I found that the GMM estimator type isn't a clusterer but a DensityEstimator ( which I have never see before). I know how to fix this
  2. GMM doesn't have a n_clusters_ attribute, which is expected by Yellowbrick for clustering estimators. I have to dig deeper into this

lwgray avatar Jun 18 '23 00:06 lwgray

@bbengfort Another update... #1294 will solve part of this issue. and #1304 fixes the "not a clustering estimator" error

lwgray avatar Jun 25 '23 23:06 lwgray