orange3
orange3 copied to clipboard
Hierarchical Clustering needs Data Subset input channel
It would be great if Hierarchical Clustering could show a subset of data. Say, I find some data instances in t-SNE plot, and I would like to know if they have clustered together in HC as well.
One way to depict the subset is to print the labels in bold. If there are no labels, mark the subsetted leaves with a dot.
I implemented this and threw it away. It was ugly.
This widget and a few others (Distance, SilhouettePlot, Heatmap) useOrange.widgets.utils.graphicstextlist.TextListWidget
, which is a QGraphicsWidget
that displays uniformly spaced strings stored in a list. I think that a proper way to implement bold labels would be to implement TextListView
that would get data from a model and would support some roles like font and color. TextListWidget
would then be derived from TextListView
and use PyListModel
to store the list of strings.
In case of no labels, I'd simply label selected rows with "|"
and perhaps set the color to blue (and whatever's suitable for dark mode).
Note: the widget currently accepts distance matrix as an input. This functionality would work only if distance matrix has some associated data table. If not, passing a subset must show (at least) a warning.
The new signal for subset must not connect automatically, imho, otherwise users will try to connect data that they want clustered, and wonder why the widget does not work. Upon seeing the dialog with connections, they will, hopefully, realize, that they must pass distance matrix and that data can only be used as subset. Also, show a warning (or even an error!) if the user passes a subset but nothing is given as distances.