yellowbrick icon indicating copy to clipboard operation
yellowbrick copied to clipboard

Improved Rank2D (implement other metrics)

Open bbengfort opened this issue 9 years ago • 2 comments

The Rank2D visualizer is a feature analysis visualizer that ranks pairwise joint plots of feature columns together (similar to a SPLOM) using a metric in the space [-1, 1] or [0, 1]. The rankings are visualized by a heatmap with only the lower left triangle visible and a diverging or sequential color map scheme that shows the relative ranks of pairs of features.

By using different ranking metrics (Pearson, Covariance, etc), data scientists can detect issues in dependent variables that might impact machine learning - for example covariance, entropy, non-uniformity etc.

See #6 for more

Note to contributors: items in the below checklist don't need to be completed in a single PR; if you see one that catches your eye, feel to pick it off the list!

The following ranking metrics should be added:

  • [x] Pearson correlation
  • [x] Covariance
  • [x] Spearman correlation
  • [x] Kendall Tau correlation
  • [ ] mutual-info classification
  • [ ] mutual-info regression
  • [ ] Least Squares Error
  • [ ] Quadracity
  • [ ] Density based outlier detection
  • [ ] Uniformity (entropy of grids)
  • [ ] Number of items in most dense region

See: Seo, Jinwook, and Ben Shneiderman. "A rank-by-feature framework for interactive exploration of multidimensional data." Information visualization 4.2 (2005): 96-113.

The following visual improvements need to be made:

  • [ ] Make the colobar smaller and nicer
  • [ ] Add xlabels, ylabels and ticks that are nicely spaced
  • [ ] Allow for annotations in the cells
  • [ ] Make the "null" variables greyed out
  • [ ] Show joint plot on click

See: https://github.com/mwaskom/seaborn/blob/master/seaborn/matrix.py#L94

and

https://stanford.edu/~mwaskom/software/seaborn/examples/many_pairwise_correlations.html

NOTE: New correlation metrics should also be considered to add to JointPlot visualizer. See #721 for more details.

bbengfort avatar Oct 07 '16 20:10 bbengfort

I created a pull request (https://github.com/DistrictDataLabs/yellowbrick/pull/429) to add spearman correlation to the list of ranking metrics.

tabishsada avatar May 15 '18 19:05 tabishsada

#645 adds Kendall-Tau

bbengfort avatar Dec 11 '18 21:12 bbengfort