yellowbrick icon indicating copy to clipboard operation
yellowbrick copied to clipboard

Finish JointPlot for Machine Learning Use Cases

Open bbengfort opened this issue 6 years ago • 3 comments

This issue is a follow on to #721 to wrap up the extension of JointPlot for machine learning-specific use cases. The tasks are as follows:

  • [ ] Finish the JointPlot docstring
  • [ ] In the case where two columns are specified, color the plot with the target variable as in Manifold
  • [ ] Add in best fit line(s) as an option (this might extend the kind parameter) - note that lines may drawn for each class in a discrete target.
  • [ ] Update the JointPlot documentation to reflect the machine learning specific use case of this visualizer.
  • [ ] implement the quick method
  • [ ] make the aspect ratio square

Note that the square aspect ratio is being discussed here:

https://stackoverflow.com/questions/54545758/create-equal-aspect-square-plot-with-multiple-axes-when-data-limits-are-differ

For the documentation ensure we add images that show a few different versions of JointPlot:

  • [ ] feature-to-target
  • [ ] feature-to-feature
  • [ ] use of the hexbin plot
  • [ ] feature-to-feature with discrete classes colored
  • [ ] feature-to-feature with heatmap for regression
  • [ ] use of different correlation measures

Add the following tests:

  • [ ] test hist="density" image similarity
  • [ ] test unknown plot kind raises exception after being set correctly in init
  • [ ] test hexbin plot with and without histogram
  • [ ] test exception when columns=['onecol'] is passed (line 246)
  • [ ] test X and y being passed as python lists and tuples
  • [ ] test quick method with and without histogram

See coverage report for details:

https://coveralls.io/builds/21488827/source?filename=yellowbrick/features/jointplot.py

bbengfort avatar Feb 06 '19 16:02 bbengfort

I'm getting a bit of weirdness when I try to use the kind='hexbin' parameter:

import pandas as pd
import yellowbrick as yb
from yellowbrick.datasets import load_bikeshare
from yellowbrick.features import JointPlotVisualizer

data = load_bikeshare(return_dataset=True)
X, y = data.to_pandas()

hex = JointPlotVisualizer(
    columns=['temp', 'feelslike'], kind='hexbin'
)

hex.fit_transform(X, y)
hex.poof()

This results in a value error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-5f43f79e19da> in <module>()
      4 
      5 yes_hex.fit_transform(X, y)
----> 6 yes_hex.poof()

~/Desktop/eudicot/acorn/my_yb/yellowbrick/base.py in poof(self, outpath, clear_figure, **kwargs)
    220 
    221         # Finalize the figure
--> 222         self.finalize()
    223 
    224         if outpath is not None:

~/Desktop/eudicot/acorn/my_yb/yellowbrick/features/jointplot.py in finalize(self, **kwargs)
    383         # Set the legend with full opacity patches using manual legend.
    384         # Or Add the colorbar if this is a continuous plot.
--> 385         self.ax.legend(loc="best", frameon=True)
    386 
    387         # Finalize the histograms

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/axes/_axes.py in legend(self, *args, **kwargs)
    404         if len(extra_args):
    405             raise TypeError('legend only accepts two non-keyword arguments')
--> 406         self.legend_ = mlegend.Legend(self, handles, labels, **kwargs)
    407         self.legend_._remove_method = self._remove_legend
    408         return self.legend_

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/legend.py in __init__(self, parent, handles, labels, loc, numpoints, markerscale, markerfirst, scatterpoints, scatteryoffsets, prop, fontsize, borderpad, labelspacing, handlelength, handleheight, handletextpad, borderaxespad, columnspacing, ncol, mode, fancybox, shadow, title, title_fontsize, framealpha, edgecolor, facecolor, bbox_to_anchor, bbox_transform, frameon, handler_map)
    573 
    574         # init with null renderer
--> 575         self._init_legend_box(handles, labels, markerfirst)
    576 
    577         # If shadow is activated use framealpha if not

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/legend.py in _init_legend_box(self, handles, labels, markerfirst)
    831                 # original artist/handle.
    832                 handle_list.append(handler.legend_artist(self, orig_handle,
--> 833                                                          fontsize, handlebox))
    834                 handles_and_labels.append((handlebox, textbox))
    835 

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/legend_handler.py in legend_artist(self, legend, orig_handle, fontsize, handlebox)
    113         artists = self.create_artists(legend, orig_handle,
    114                                       xdescent, ydescent, width, height,
--> 115                                       fontsize, handlebox.get_transform())
    116 
    117         # create_artists will return a list of artists.

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/legend_handler.py in create_artists(self, legend, orig_handle, xdescent, ydescent, width, height, fontsize, trans)
    744         p = Rectangle(xy=(-xdescent, -ydescent),
    745                       width=width, height=height)
--> 746         self.update_prop(p, orig_handle, legend)
    747         p.set_transform(trans)
    748         return [p]

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/legend_handler.py in update_prop(self, legend_handle, orig_handle, legend)
     70     def update_prop(self, legend_handle, orig_handle, legend):
     71 
---> 72         self._update_prop(legend_handle, orig_handle)
     73 
     74         legend._set_artist_props(legend_handle)

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/legend_handler.py in _update_prop(self, legend_handle, orig_handle)
    728         edgecolor = getattr(orig_handle, '_original_edgecolor',
    729                             orig_handle.get_edgecolor())
--> 730         legend_handle.set_edgecolor(first_color(edgecolor))
    731         facecolor = getattr(orig_handle, '_original_facecolor',
    732                             orig_handle.get_facecolor())

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/legend_handler.py in first_color(colors)
    715             if colors is None:
    716                 return None
--> 717             colors = mcolors.to_rgba_array(colors)
    718             if len(colors):
    719                 return colors[0]

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/colors.py in to_rgba_array(c, alpha)
    284     result = np.empty((len(c), 4), float)
    285     for i, cc in enumerate(c):
--> 286         result[i] = to_rgba(cc, alpha)
    287     return result
    288 

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/colors.py in to_rgba(c, alpha)
    175         rgba = None
    176     if rgba is None:  # Suppress exception chaining of cache lookup failure.
--> 177         rgba = _to_rgba_no_colorcycle(c, alpha)
    178         try:
    179             _colors_full_map.cache[c, alpha] = rgba

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/matplotlib/colors.py in _to_rgba_no_colorcycle(c, alpha)
    229         except ValueError:
    230             pass
--> 231         raise ValueError("Invalid RGBA argument: {!r}".format(orig_c))
    232     # tuple color.
    233     c = np.array(c)

ValueError: Invalid RGBA argument: 'f'

And a weird plot where the colors don't match (among other TODOs already noted in this issue, like squaring up the axis):

image

Python 3.6 MacOS Matplotlib==3.1.0 YB develop branch, inside a Jupyter NB

rebeccabilbro avatar Jul 07 '19 15:07 rebeccabilbro

We got a bump from @ZhiliangWu in #1125 for target colorization in the JointPlot. @ZhiliangWu would you mind commenting on your use case for target-based dimensionality in the joint plot? Also, if you're interested in taking on just that component, we'd be glad to have you open a PR!

bbengfort avatar Oct 26 '20 15:10 bbengfort

Hi @bbengfort. Thanks for your quick reply and bringing this up. I thought it was a bug since in the documentation it says that the coloring is supported but nothing shows when I really used it.

My use case is to visually check whether there is any linear trend of the learned hidden representations from neural networks with the target, where I deliberately set the size of the hidden representation to 2. Coloring enables another dimensionality of looking at the relationship between these variables. Such coloring is also important to understand whether the representation learned by the model is meaningful or not. Also, I noticed there is another class ScatterVisualizer, where coloring (currently) only works when the target y is discrete values.

ZhiliangWu avatar Oct 26 '20 16:10 ZhiliangWu