yellowbrick icon indicating copy to clipboard operation
yellowbrick copied to clipboard

ModelVisualizer + VisualPipeline

Open falcaopetri opened this issue 4 years ago • 1 comments

Is your feature request related to a problem? Please describe. I was trying to combine the KElbowVisualizer with VisualPipeline but was stuck getting the unexpected error below:

AttributeError: 'KMeans' object has no attribute 'axes'

Full pipeline snippet
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_blobs

from yellowbrick.cluster import KElbowVisualizer
from yellowbrick.pipeline import VisualPipeline

X, _ = make_blobs(n_samples=100, centers=3, n_features=2, random_state=0)
pipe = VisualPipeline([
    ('scale', StandardScaler()),
    ('elbow', KElbowVisualizer(KMeans()))
])
pipe.fit_transform_show(X)

The important part is that I was trying to run pipe.fit_transform_show(X). Of course, pipe.fit(X); pipe.show() worked fine, and was similar to the usage in KElbowVisualizer's example.

The issue

The issue is that VisualPipeline will try to call KElbowVisualizer.fit_transform, which does not exist. Due to Wrapper, KMeans.fit_transform will be executed instead, which means that KElbowVisualizer.fit is never called.

Feature request

The main problem was my rush to get it all in one line, fit_transform_show(). I still think though it's an honest usage and might be tried by other people. Some ideas came in mind to improve this usage:

  1. Better documentation and more examples about VisualPipeline. It's a nice feature, but I was only able to find brief referentes to it such as in Classification Visualizers.
  2. Have VisualPipeline implementing a fit_show(X) method. This would probably yield more confusion, but has the nice property of not returning the undesired transformed output (when compared to fit_transform_show).
  3. FeatureVisualizer currently implements sklearn's TransformerMixin, but ModelVisualizer does not. In contrast, KMeans, although an estimator, also implements the TransformerMixin. ModelVisualizer (or simply Visualizer) implementing the TransformerMixin would force the call of KElbowVisualizer().fit().transform(). 3.1. A complementary approach would be ModelVisualizer implementing an empty transform method. This would allow us to apply KElbowVisualizer.fit_transform_show() without getting back an array of distances (from KMeans.transform)

I don't know the impact in the other ModelVisualizer's or on VisualPipeline usages, but I'd be glad to help implementing these or other ideas to improve VisualPipeline.

falcaopetri avatar Oct 17 '21 19:10 falcaopetri

@falcaopetri thank you for contributing to Yellowbrick and for reporting the issue that you're having with the VisualPipeline. I know that the VisualPipeline needs a lot of love, it's more or less a prototype and has not really risen to the level of core functionality in Yellowbrick quite yet. All of your suggestions are excellent though! If you're interested in a second PR after we get through #1202 - we'd be happy to review your work!

bbengfort avatar Nov 10 '21 14:11 bbengfort