yellowbrick
yellowbrick copied to clipboard
Finish JointPlot for Machine Learning Use Cases
This issue is a follow on to #721 to wrap up the extension of JointPlot
for machine learning-specific use cases. The tasks are as follows:
- [ ] Finish the JointPlot docstring
- [ ] In the case where two columns are specified, color the plot with the target variable as in Manifold
- [ ] Add in best fit line(s) as an option (this might extend the
kind
parameter) - note that lines may drawn for each class in a discrete target. - [ ] Update the JointPlot documentation to reflect the machine learning specific use case of this visualizer.
- [ ] implement the quick method
- [ ] make the aspect ratio square
Note that the square aspect ratio is being discussed here:
https://stackoverflow.com/questions/54545758/create-equal-aspect-square-plot-with-multiple-axes-when-data-limits-are-differ
For the documentation ensure we add images that show a few different versions of JointPlot:
- [ ] feature-to-target
- [ ] feature-to-feature
- [ ] use of the hexbin plot
- [ ] feature-to-feature with discrete classes colored
- [ ] feature-to-feature with heatmap for regression
- [ ] use of different correlation measures
Add the following tests:
- [ ] test hist="density" image similarity
- [ ] test unknown plot kind raises exception after being set correctly in init
- [ ] test hexbin plot with and without histogram
- [ ] test exception when
columns=['onecol']
is passed (line 246) - [ ] test X and y being passed as python lists and tuples
- [ ] test quick method with and without histogram
See coverage report for details:
https://coveralls.io/builds/21488827/source?filename=yellowbrick/features/jointplot.py
I'm getting a bit of weirdness when I try to use the kind='hexbin'
parameter:
import pandas as pd
import yellowbrick as yb
from yellowbrick.datasets import load_bikeshare
from yellowbrick.features import JointPlotVisualizer
data = load_bikeshare(return_dataset=True)
X, y = data.to_pandas()
hex = JointPlotVisualizer(
columns=['temp', 'feelslike'], kind='hexbin'
)
hex.fit_transform(X, y)
hex.poof()
This results in a value error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-9-5f43f79e19da> in <module>()
4
5 yes_hex.fit_transform(X, y)
----> 6 yes_hex.poof()
~/Desktop/eudicot/acorn/my_yb/yellowbrick/base.py in poof(self, outpath, clear_figure, **kwargs)
220
221 # Finalize the figure
--> 222 self.finalize()
223
224 if outpath is not None:
~/Desktop/eudicot/acorn/my_yb/yellowbrick/features/jointplot.py in finalize(self, **kwargs)
383 # Set the legend with full opacity patches using manual legend.
384 # Or Add the colorbar if this is a continuous plot.
--> 385 self.ax.legend(loc="best", frameon=True)
386
387 # Finalize the histograms
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/axes/_axes.py in legend(self, *args, **kwargs)
404 if len(extra_args):
405 raise TypeError('legend only accepts two non-keyword arguments')
--> 406 self.legend_ = mlegend.Legend(self, handles, labels, **kwargs)
407 self.legend_._remove_method = self._remove_legend
408 return self.legend_
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/legend.py in __init__(self, parent, handles, labels, loc, numpoints, markerscale, markerfirst, scatterpoints, scatteryoffsets, prop, fontsize, borderpad, labelspacing, handlelength, handleheight, handletextpad, borderaxespad, columnspacing, ncol, mode, fancybox, shadow, title, title_fontsize, framealpha, edgecolor, facecolor, bbox_to_anchor, bbox_transform, frameon, handler_map)
573
574 # init with null renderer
--> 575 self._init_legend_box(handles, labels, markerfirst)
576
577 # If shadow is activated use framealpha if not
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/legend.py in _init_legend_box(self, handles, labels, markerfirst)
831 # original artist/handle.
832 handle_list.append(handler.legend_artist(self, orig_handle,
--> 833 fontsize, handlebox))
834 handles_and_labels.append((handlebox, textbox))
835
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/legend_handler.py in legend_artist(self, legend, orig_handle, fontsize, handlebox)
113 artists = self.create_artists(legend, orig_handle,
114 xdescent, ydescent, width, height,
--> 115 fontsize, handlebox.get_transform())
116
117 # create_artists will return a list of artists.
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/legend_handler.py in create_artists(self, legend, orig_handle, xdescent, ydescent, width, height, fontsize, trans)
744 p = Rectangle(xy=(-xdescent, -ydescent),
745 width=width, height=height)
--> 746 self.update_prop(p, orig_handle, legend)
747 p.set_transform(trans)
748 return [p]
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/legend_handler.py in update_prop(self, legend_handle, orig_handle, legend)
70 def update_prop(self, legend_handle, orig_handle, legend):
71
---> 72 self._update_prop(legend_handle, orig_handle)
73
74 legend._set_artist_props(legend_handle)
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/legend_handler.py in _update_prop(self, legend_handle, orig_handle)
728 edgecolor = getattr(orig_handle, '_original_edgecolor',
729 orig_handle.get_edgecolor())
--> 730 legend_handle.set_edgecolor(first_color(edgecolor))
731 facecolor = getattr(orig_handle, '_original_facecolor',
732 orig_handle.get_facecolor())
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/legend_handler.py in first_color(colors)
715 if colors is None:
716 return None
--> 717 colors = mcolors.to_rgba_array(colors)
718 if len(colors):
719 return colors[0]
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/colors.py in to_rgba_array(c, alpha)
284 result = np.empty((len(c), 4), float)
285 for i, cc in enumerate(c):
--> 286 result[i] = to_rgba(cc, alpha)
287 return result
288
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/colors.py in to_rgba(c, alpha)
175 rgba = None
176 if rgba is None: # Suppress exception chaining of cache lookup failure.
--> 177 rgba = _to_rgba_no_colorcycle(c, alpha)
178 try:
179 _colors_full_map.cache[c, alpha] = rgba
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/colors.py in _to_rgba_no_colorcycle(c, alpha)
229 except ValueError:
230 pass
--> 231 raise ValueError("Invalid RGBA argument: {!r}".format(orig_c))
232 # tuple color.
233 c = np.array(c)
ValueError: Invalid RGBA argument: 'f'
And a weird plot where the colors don't match (among other TODOs already noted in this issue, like squaring up the axis):
Python 3.6 MacOS Matplotlib==3.1.0 YB develop branch, inside a Jupyter NB
We got a bump from @ZhiliangWu in #1125 for target colorization in the JointPlot. @ZhiliangWu would you mind commenting on your use case for target-based dimensionality in the joint plot? Also, if you're interested in taking on just that component, we'd be glad to have you open a PR!
Hi @bbengfort. Thanks for your quick reply and bringing this up. I thought it was a bug since in the documentation it says that the coloring is supported but nothing shows when I really used it.
My use case is to visually check whether there is any linear trend of the learned hidden representations from neural networks with the target, where I deliberately set the size of the hidden representation to 2. Coloring enables another dimensionality of looking at the relationship between these variables. Such coloring is also important to understand whether the representation learned by the model is meaningful or not. Also, I noticed there is another class ScatterVisualizer
, where coloring (currently) only works when the target y is discrete values.