orange3
orange3 copied to clipboard
PCA: transpose Components output
What's your use case?
When looking at Components output from PCA, one generally wishes to observe top features (and values) for each PC. It is best seen from Data Table, but in the current version one cannot sort by PC as they are in rows, not in columns.
What's your proposed solution?
Transpose Components output.
Are there any alternative solutions? One can always do it with Transpose, but do we really prefer having PCs are rows as default?
For use in spectroscopy this would be a step back. Users want to see PC components compared to data, which are now in the same space and easily comparable with two visualization on top of each other.
Also, components are defined in attribute space, so the current orientation makes sense. As it is now, these can also keep the exact same domain.
I would suggest the opposite. We should transpose the Scores output in Rank and linear models Coefficients outputs. Scores there correspond to the columns in the data, and keeping the same output orientation would make the relationship clearer. I have never used Rank's Scores and model's coefficients output without Transpose at any workshop.
I do admit that sorting by value would be easier if the output is transposed, but everything else is harder.
Thinking a bit more about this, I's say we have two aspects:
- Semantics. What do the results mean, how do they correspond to original data.
- Visualization. How do we want to look at results.
I would prefer operations to keep the semantics for (1) clarity and (2) so that internal structures are not modified if not needed (transpose is a lossy operation).
The problem originally mentioned in this issue is visualization-related. Data Table is a visualization widget. Perhaps this particular visualization is not flexible enough? If we decide that the Transpose widget is too cumbersome here, I would suggest fixing this issue at the visualization level.
A survey of the current use of the two representation options, before we discuss this further.
Feature coefficients in the (original) feature space:
- PCA (Components)
- k-Means (Centroids)
- (slightly related) SVM (support vectors; here the model is defined by a selection of samples)
Feature coefficients as row elements:
- Feature Statistics (Statistics)
- Rank (Scores)
- Linear Regression, Logistic Regression, Stochastic Gradient Descent (Coefficients)
We don't have a clear consensus, so let us (for now?) keep it as it is.