yellowbrick
yellowbrick copied to clipboard
ModelVisualizer + VisualPipeline
Is your feature request related to a problem? Please describe.
I was trying to combine the KElbowVisualizer with VisualPipeline but was stuck getting the unexpected error below:
AttributeError: 'KMeans' object has no attribute 'axes'
Full pipeline snippet
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_blobs
from yellowbrick.cluster import KElbowVisualizer
from yellowbrick.pipeline import VisualPipeline
X, _ = make_blobs(n_samples=100, centers=3, n_features=2, random_state=0)
pipe = VisualPipeline([
('scale', StandardScaler()),
('elbow', KElbowVisualizer(KMeans()))
])
pipe.fit_transform_show(X)
The important part is that I was trying to run pipe.fit_transform_show(X). Of course, pipe.fit(X); pipe.show() worked fine, and was similar to the usage in KElbowVisualizer's example.
The issue
The issue is that VisualPipeline will try to call KElbowVisualizer.fit_transform, which does not exist. Due to Wrapper, KMeans.fit_transform will be executed instead, which means that KElbowVisualizer.fit is never called.
Feature request
The main problem was my rush to get it all in one line, fit_transform_show(). I still think though it's an honest usage and might be tried by other people. Some ideas came in mind to improve this usage:
- Better documentation and more examples about
VisualPipeline. It's a nice feature, but I was only able to find brief referentes to it such as in Classification Visualizers. - Have
VisualPipelineimplementing afit_show(X)method. This would probably yield more confusion, but has the nice property of not returning the undesired transformed output (when compared tofit_transform_show). FeatureVisualizercurrently implements sklearn'sTransformerMixin, butModelVisualizerdoes not. In contrast,KMeans, although an estimator, also implements theTransformerMixin.ModelVisualizer(or simplyVisualizer) implementing theTransformerMixinwould force the call ofKElbowVisualizer().fit().transform(). 3.1. A complementary approach would beModelVisualizerimplementing an emptytransformmethod. This would allow us to applyKElbowVisualizer.fit_transform_show()without getting back an array of distances (fromKMeans.transform)
I don't know the impact in the other ModelVisualizer's or on VisualPipeline usages, but I'd be glad to help implementing these or other ideas to improve VisualPipeline.
@falcaopetri thank you for contributing to Yellowbrick and for reporting the issue that you're having with the VisualPipeline. I know that the VisualPipeline needs a lot of love, it's more or less a prototype and has not really risen to the level of core functionality in Yellowbrick quite yet. All of your suggestions are excellent though! If you're interested in a second PR after we get through #1202 - we'd be happy to review your work!