tensorboard icon indicating copy to clipboard operation
tensorboard copied to clipboard

[Projector] Distance and neighborhood selection (Inspector panel)

Open francoisluus opened this issue 7 years ago • 8 comments

Add to the Inspector panel a dropdown-menu for that distance metric/space selection, and a second dropdown-menu for the neighborhood function selection. Automatically update distance metric/space choices upon change in the projection choice, and set to the newest projection choice. Allow for future distance metric/space and neighborhood function definitions to expand the available options.

The functionality proposed here would complement the interactive supervision by allowing for refined/specific neighborhood selection in a variety of projections.

~Demo here: http://tensorserve.com:6017~

git clone https://github.com/francoisluus/tensorboard-supervise.git
cd tensorboard-supervise
git checkout e212cbb57ad604d5cab0f2d95637e4e9c85378fc
bazel run tensorboard -- --logdir /home/$USER/emnist-2000 --host 0.0.0.0 --port 6017

Design and behavior

General

  1. The use of two dropdown-menus allows for the direct support of alternative metrics and neighborhood functions.
  2. Distance metrics are defined in terms of a chosen distance measure and a metric space in which it is to be used.
  3. The projector notifies the inspector panel of a change in the current projection, and then the available selections are automatically updated to refresh the available metric/projection spaces and the current distance metric selection.
  4. The font size of the nn slider input is styled in anticipation of https://github.com/PolymerElements/paper-slider/pull/208, but will remain slightly larger until the Polymer update becomes available in TensorBoard.

Distance selection

  1. Previously two distance metrics were available, namely cosine and euclidean in the original space.
  2. Add euclidean distance in the PCA space, and euclidean distance in the t-SNE space as additional distance metrics.
  3. In the case of t-SNE, the distance option won't be available until there is already a t-SNE projection calculated.
  4. When switching from PCA to t-SNE, the distance option will switch to t-SNE automatically and vice versa.

Neighborhood selection

  1. Previously only direct knn were selected, based on the distance metric provided.
  2. Add geodesic neighborhood selection as an alternative in knn.ts.
  3. Neighborhood functions can be defined according to a type template in knn.ts, and new functions can be included in the available selection through minimal hard-coding.

Geodesic neighborhoods

  1. Geodesic neighborhoods are determined in an approximate manner by hopping from the initial sample into its immediate knn neighborhood and radiating outward, effectively growing the neighborhood over the manifold with edge-count or geodesic distance.
  2. The main geodesic selection tuning-parameters are k=5 for the knn neighborhood expansion, and the maximum edge distance to incorporate a candidate sample which is set at twice the average edge distance. Alternative instantiations can be provided, e.g. geodesic (tight), geodesic (medium), geodesic (loose).
  3. Geodesic selection is sensitive to the first step distances, so if a relatively isolated sample is chosen then the geodesic selection would be larger as opposed to choosing a sample with closer neighbors. This discrepancy gives some versatility to the user to craft the most agreeable selection size when choosing different samples.

Right-click selection mode

  1. Now a mouse right-click acts as selection editor, whether in normal or editMode, such that it deselects a currently selected point (unless it is the main/first point) and vice versa. Selection mode types are introduced to support future expansion of different selection interactions.
  2. ~Add 'contextmenu' event capture in scatterplot, the standard HTML5 right-click event, and bind to selection 'edit' mode. This allows for quick selection edits without explicitly switching to editMode via the toolbar. This event handler checks that the event is not part of a drag sequence or area select.~
  3. Since the contextmenu event only fires after mousedown, we should rather capture the right-click event during mouseup by checking for the right mouse button. During mouseup, if it is not a drag sequence and the right mouse button was clicked, we toggle selection/deselection of the nearest point. This fixes the bug where right-click edited the selection despite mousemove even though it was in fact part of a drag sequence, as the logic failed earlier when contextmenu followed mousedown as opposed to onclick which proceeds mousedown.

Custom projection (coordinate naming consistency)

Custom projection coordinates were named 'linear-x' and 'linear-y', this was changed to 'custom-0' and 'custom-1' to allow for naming consistency with pca, tsne and future projections.

This consistent format is then utilized to automate the extraction of available projections from the datapoint coordinates, and auto-set the distance metric/space choice correctly upon projection availability changes.

Concerns and issues

  1. 'Distance' is automatically set to the current projection space, so neighborhoods will be calculated in the current projection space by default, e.g. 'PCA' distance will be set when switching to PCA. The previous default was Cosine as this can use GPU acceleration, so is a faster default. If this is a real concern, we can hardcode the first startup default distance to 'Cosine'.
  2. The button heights in the top row of the Inspector panel was reduced, as I could not find a case where all three lines are required, but my search was not exhaustive. If the extra vertical space is indeed to required, I can reset it.

Inspector panel (before & after)

Before

screen shot 2018-01-02 at 4 22 11 pm

After

screen shot 2018-01-02 at 3 57 39 pm

Distance selection

screen shot 2018-01-02 at 3 58 06 pm

Neighborhood selection

screen shot 2018-01-02 at 3 58 46 pm

francoisluus avatar Jan 02 '18 14:01 francoisluus

@jart @dsmilkov Happy new year, as promised in this PR please find the remaining functionality for interactive supervision. Travis certificate verification fails, but the commit does indeed build correctly otherwise.

I'd love to receive comments about this PR and make any necessary changes or adjustment. I think this PR adds very useful and important functionality for the projector.

francoisluus avatar Jan 02 '18 15:01 francoisluus

Thank you for your patience. This appears to be a high impact change that I alone am not fully qualified to review. I'm going to start poking around to find the best reviewer who has the cycles right now.

jart avatar Feb 05 '18 18:02 jart

Appologies @francoisluus for the long wait! I'll review now

dsmilkov avatar Feb 12 '18 15:02 dsmilkov

This looks amazing! A high-level q:

  • How useful is it to have geodesic neighborhood? It adds complexity to the UI, the codebase, as well as it adds hyper-params. In practice, if it is highly correlated with nearest neighbors, I'm not sure it's worth the code complexity.

Curious to hear your thoughts!

dsmilkov avatar Feb 12 '18 15:02 dsmilkov

@dsmilkov Thanks for checking out this contribution.

Geodesic neighborhoods are correlated with nearest neighbors in particular respects, but quality neighborhood boundaries can obtained with geodesic neighborhoods by incorporating its neighbor step distance statistics.

Granted, "most" datasets do not really display anything more than globular manifolds in most embeddings, so walking across the manifold could be approximated with nearest neighbors. However, even with globular manifolds there is often some cluster separation that geodesic neighborhoods can detect, whereas nearest neighbors would bleed the selection beyond the confines of the natural cluster.

So natural cluster selection can be achieved with geodesic neighborhood selection, which would otherwise require fine-grained neighborhood size control if nearest neighbor selection were to achieve the same.

In having played around a lot with the different neighborhood selection options, I could offer from my subjective experience that geodesic neighborhood is definitely my favourite, especially for labeling samples. Just because it gives quality "cluster" selections without having to really fine-tune anything. Nearest neighbor often bleeds selections that violates the apparent manifold.

Of course, at face value, I would totally agree with you, keep things as simple as possible. Perhaps containing the geodesic neighborhood option in the dropdown helps to tone down the clutter. In terms of cluttering the code itself, we primarily have one extra KNNFunc. I am inclined to give opportunity to the community to provide feedback on the usefulness of geodesic neighborhoods.

projector-dist-2

francoisluus avatar Feb 14 '18 15:02 francoisluus

When will this plugin be merged to the main TensorBoard branch? If not; could someone please explain me how I can use it locally?

RooieRakkert avatar Jul 12 '18 08:07 RooieRakkert

@RooieRakkert One way is to clone the proposed branch and then build the custom tensorboard you can use locally. You would need git and bazel in a shell env.

git clone https://github.com/francoisluus/tensorboard-supervise.git
cd tensorboard-supervise
git checkout e212cbb57ad604d5cab0f2d95637e4e9c85378fc
bazel run tensorboard -- --logdir /home/$USER/emnist-2000 --host 0.0.0.0 --port 6017

francoisluus avatar Jul 12 '18 09:07 francoisluus

I didn't realize this PR was still stalled. @dsmilkov do you have further thoughts?

nfelt avatar Jul 12 '18 18:07 nfelt