orange3 icon indicating copy to clipboard operation
orange3 copied to clipboard

Hierarchical Clustering needs Data Subset input channel

Open BlazZupan opened this issue 2 years ago • 2 comments

It would be great if Hierarchical Clustering could show a subset of data. Say, I find some data instances in t-SNE plot, and I would like to know if they have clustered together in HC as well.

One way to depict the subset is to print the labels in bold. If there are no labels, mark the subsetted leaves with a dot.

BlazZupan avatar Sep 05 '22 10:09 BlazZupan

I implemented this and threw it away. It was ugly.

This widget and a few others (Distance, SilhouettePlot, Heatmap) useOrange.widgets.utils.graphicstextlist.TextListWidget, which is a QGraphicsWidget that displays uniformly spaced strings stored in a list. I think that a proper way to implement bold labels would be to implement TextListView that would get data from a model and would support some roles like font and color. TextListWidget would then be derived from TextListView and use PyListModel to store the list of strings.

In case of no labels, I'd simply label selected rows with "|" and perhaps set the color to blue (and whatever's suitable for dark mode).

janezd avatar Sep 06 '22 18:09 janezd

Note: the widget currently accepts distance matrix as an input. This functionality would work only if distance matrix has some associated data table. If not, passing a subset must show (at least) a warning.

The new signal for subset must not connect automatically, imho, otherwise users will try to connect data that they want clustered, and wonder why the widget does not work. Upon seeing the dialog with connections, they will, hopefully, realize, that they must pass distance matrix and that data can only be used as subset. Also, show a warning (or even an error!) if the user passes a subset but nothing is given as distances.

janezd avatar Sep 06 '22 18:09 janezd