yellowbrick
yellowbrick copied to clipboard
Maximize whitespace with feature-reordering optimization in ParallelCoordinates and RadViz
Both RadViz and ParallelCoordinates would benefit from increased whitespace/increased transparency that is achieved simply by recording the columns around the circle in RadViz and along the horizontal in ParallelCoordinates. Potentially some optimization technique would allow us to discover the best feature ordering/subset of features to display.
Proposal/Issue
- [ ] enhance
RadViz/ParallelCoordinatesto specify the ordering of the features - [ ] create a function/method to compute the amount of whitespace or alpha transparency in the figure
- [ ] implement an optimization method (Hill Climbing, Simulated Annealing, etc.) to maximize the whitespace/alpha transparency using feature orders as individual search points.
@rebeccabilbro suggests in #448 that there is potentially a way to numerically represent braid density (a la silhouettes) that could be computed and added to the figure.
Toy algorithm that I just happened to make:
features_to_handle = list(range(len(features)))
features_ordered = []
last_feature = 0
while features_to_handle:
invdists = 1. / (visualizer.ranks_[last_feature,features_to_handle]**2 + 1e-5)
invdists /= invdists.sum()
i = numpy.random.choice(range(len(features_to_handle)), p=invdists)
a = features_to_handle.pop(i)
features_ordered.append(a)
last_feature = a
features = [features[i] for i in features_ordered]
X = X[:,features_ordered]
Can also do i = numpy.argmax(invdists) instead if you want it deterministic.
@JohannesBuchner I'd be very interested into digging into this further; I assume though that we need to get through #650 first?