yellowbrick
yellowbrick copied to clipboard
Unable to use Silhouette Visualizer with Gaussian Mixture Model
Describe the bug Silhouette score and its visualization can be calculated for Gaussian Mixture Model outputs, while this library currently does not support this.
To Reproduce I used the example code from here and I changed the model from Kmeans to GMM.
# Steps to reproduce the behavior (code snippet):
# Should include imports, dataset loading, and execution
from sklearn.mixture import GaussianMixture as GMM
from yellowbrick.cluster import SilhouetteVisualizer
from yellowbrick.datasets import load_nfl
# Load a clustering dataset
X, y = load_nfl()
# Specify the features to use for clustering
features = ['Rec', 'Yds', 'TD', 'Fmb', 'Ctch_Rate']
X = X.query('Tgt >= 20')[features]
# Instantiate the clustering model and visualizer
model = GMM(5, random_state=42)
visualizer = SilhouetteVisualizer(model, colors='yellowbrick')
visualizer.fit(X) # Fit the data to the visualizer
visualizer.show() # Finalize and render the figure
Dataset The dataset chosen does not affect the outcome.
Expected behavior I expect the fitting of the data and the visualization of the scores over the figure.
Traceback
Traceback (most recent call last):
File "sil_testet.py", line 15, in <module>
visualizer = SilhouetteVisualizer(model, colors='yellowbrick')
File "/usr/local/lib/python3.8/dist-packages/yellowbrick/cluster/silhouette.py", line 118, in __init__
super(SilhouetteVisualizer, self).__init__(estimator, ax=ax, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/yellowbrick/cluster/base.py", line 45, in __init__
raise YellowbrickTypeError(
yellowbrick.exceptions.YellowbrickTypeError: The supplied model is not a clustering estimator; try a classifier or regression score visualizer instead!
Desktop (please complete the following information):
- OS: Ubuntu 20.04
- Python Version 3.8
- Yellowbrick Version I have no clue on how to retrieve it, I installed it with pip.
Additional context
I believe SilhouetteVisualizer should support GMM due to the possibility of using it as a clustering methodology (e.g., Gaussian Mixture Models Clustering Algorithm Explained).
I see that this may be also solved by the merging of PR #1294.
@Thecave3 I was hoping that your issue would be solved by #1294. @lwgray any status on that PR?
@bbengfort I believe that it can solve the issue, however I am not sure about the automatic tests that are preventing the PR to be merged.
@bbengfort I believe that it can solve the issue, however I am not sure about the automatic tests that are preventing the PR to be merged.
@lwgray any thoughts?
I will create the test this weekend.
Cheers Larry
On Fri, Jun 16, 2023 at 6:32 AM Benjamin Bengfort @.***> wrote:
@bbengfort https://github.com/bbengfort I believe that it can solve the issue, however I am not sure about the automatic tests that are preventing the PR to be merged.
@lwgray https://github.com/lwgray any thoughts?
— Reply to this email directly, view it on GitHub https://github.com/DistrictDataLabs/yellowbrick/issues/1303#issuecomment-1594604474, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHUFNJNDAR3IPTIZPNG4J3XLRG4DANCNFSM6AAAAAAXAISBUM . You are receiving this because you were mentioned.Message ID: @.***>
- I found that the GMM estimator type isn't a clusterer but a DensityEstimator ( which I have never see before). I know how to fix this
- GMM doesn't have a n_clusters_ attribute, which is expected by Yellowbrick for clustering estimators. I have to dig deeper into this
@bbengfort Another update... #1294 will solve part of this issue. and #1304 fixes the "not a clustering estimator" error