yellowbrick icon indicating copy to clipboard operation
yellowbrick copied to clipboard

Maximize whitespace with feature-reordering optimization in ParallelCoordinates and RadViz

Open bbengfort opened this issue 7 years ago • 3 comments

Both RadViz and ParallelCoordinates would benefit from increased whitespace/increased transparency that is achieved simply by recording the columns around the circle in RadViz and along the horizontal in ParallelCoordinates. Potentially some optimization technique would allow us to discover the best feature ordering/subset of features to display.

Proposal/Issue

  • [ ] enhance RadViz/ParallelCoordinates to specify the ordering of the features
  • [ ] create a function/method to compute the amount of whitespace or alpha transparency in the figure
  • [ ] implement an optimization method (Hill Climbing, Simulated Annealing, etc.) to maximize the whitespace/alpha transparency using feature orders as individual search points.

bbengfort avatar May 27 '18 16:05 bbengfort

@rebeccabilbro suggests in #448 that there is potentially a way to numerically represent braid density (a la silhouettes) that could be computed and added to the figure.

bbengfort avatar May 31 '18 15:05 bbengfort

Toy algorithm that I just happened to make:

features_to_handle = list(range(len(features)))
features_ordered = []
last_feature = 0
while features_to_handle:
	invdists = 1. / (visualizer.ranks_[last_feature,features_to_handle]**2 + 1e-5)
	invdists /= invdists.sum()
	i = numpy.random.choice(range(len(features_to_handle)), p=invdists)
	a = features_to_handle.pop(i)
	features_ordered.append(a)
	last_feature = a
features = [features[i] for i in features_ordered]
X = X[:,features_ordered]

Can also do i = numpy.argmax(invdists) instead if you want it deterministic.

JohannesBuchner avatar Oct 31 '18 20:10 JohannesBuchner

@JohannesBuchner I'd be very interested into digging into this further; I assume though that we need to get through #650 first?

bbengfort avatar Nov 02 '18 14:11 bbengfort