ordr
ordr copied to clipboard
extend subset parameter to all biplot layers
The experimental subset parameter of GeomIsolines$setup_data() should be extended to all plot layers.
To consistently distinguish between ggplot() and ggbiplot(), it would be good, if possible, to enable this parameter only for *_rows_*() and *_cols_*() layers. (Should it be a parameter of stat_rows(), stat_cols(), and the other new stat layers rather than of Geom*$setup_data()?)
An important decision is what inputs subset should be able to handle. Among the possibilities:
- a positive (negative) integer vector indicating the rows to include (exclude) – understood by
[.data.frame()anddplyr::slice()but not bysubset(); almost definitely worth including since it will come most naturally to users - a logical vector of length the number of rows – understood by
[.data.frame()andsubset()but not bydplyr::slice(); should not cause confusion, but could just be required to bewhich()ed instead of handled separately - a character vector of row names – understood by
[.data.frame()but not bysubset()ordplyr::slice(); could cause confusion becauseprint.tbl_ord()uses tibble-like printing, which does not display row names asprint.data.frame()does, but might be important for large data workflows
@jtr13, since you've used the package (and i'm so glad you've found it useful!), i'd be glad for your opinion. From the bulleted list above, what data would you expect to have to give ggbiplot() to have only a subset of variable axes, or just one variable axis, plotted?
Happy to give an opinion... I assume by row names you mean the column names of the original data frame. I (always) prefer character vectors of names but not critical if there are other considerations.
Correct, when the data are provided as a data frame or matrix. Thank you!
A draft solution is in the subset branch. The parameter is understood by the matrix stats (StatRows and StatCols) as well as by their corresponding extensions of StatScale, and can therefore be used with any geom layer that pairs with these stat layers (as the projection geom will).
It is not understood by the other stat layers because, as implemented, the subsetting would affect the results of their calculations. The logistic PCA examples illustrate its use with numerical input, though logical and character inputs are also accepted.
More examples, and unit tests, are needed, but this looks like a workable solution!
Awesome! I'll give it a try and see if I can come up with some examples.
Contributions are never obligatory but always welcome. Especially bug reports.
Good to know!
Works for me with integers but not characters...I'm not sure what the right syntax is. In the finches example geom_rows_vector(alpha = .5, color = "darkred", subset = 3) works.
With geom_rows_vector(alpha = .5, color = "darkred", subset = "Isabella") I get
Warning in f(...) :
Rows have no defined `.name`, so `subset` will be ignored.
And with geom_rows_vector(alpha = .5, color = "darkred", subset = Isabella) I get
Error in layer(data = data, mapping = mapping, stat = rows_stat(stat), :
object 'Isabella' not found
As an aside, imho there are an overwhelming number of geoms in the package. I'd prefer for example for geom_rows_axis() to automatically draw tick marks and tick mark labels, with parameters to turn them off if desired. Of course just a suggestion to take or leave!
@jtr13 try using augment_ord() before fortifying / plotting the data. In order for row or column names to work, the row or column data frame passed to ggplot() has to have a .name field, which will be retrieved by augment_ord() if it is available from the model object. It's clunky, but i could not come up with a better way that didn't require a package-wide overhaul.
Though i should make the warning message clearer.
Never mind! I can reproduce the error after augment_ord(). I'll see if i can hack something.
Commit a505a861b7bc9578d0f1620fd33bc29a66b64115 in the subset branch automatically maps the (custom) .name_subset aesthetic to the .name field, if it exists, in the fortified 'tbl_ord' object. The previous code that looked for .name now looks for .name_subset, which will be there if augment_ord() has been run first (and if names are found in the model object).
Please let me know again whether it works!
Note: This is not an ideal solution. A better one would be to have a "pointer" (not in the low-level sense) to the original model object that has been cloaked in the 'tbl_ord' class. The ability to access this object from within 'tbl_ord' methods and within the ggplot2 build process would potentially solve many other problems. I'm leaving this issue open until a new one is created for this goal.
Sorry my fault for not providing a full reprex showing that I was using augment_ord(). It's working great now!
# site-species data frame of Sanderson Galapagos finches data
library(ordr)
#> Loading required package: ggplot2
library(magrittr)
data(finches, package = "cooccur")
finches %>% t() %>%
logisticPCA_ord() %>%
as_tbl_ord() -> finches_lpca
finches_lpca %>%
augment_ord() %>%
ggbiplot(aes(label = .name), sec.axes = "cols", scale.factor = 50) +
geom_rows_text_radiate(subset = "Isabella") +
geom_rows_axis(subset = "Isabella", color = "royalblue3", lwd = .75) +
geom_rows_axis_text(size = 3, subset = "Isabella", color = "royalblue3",
label_dodge = 2) +
geom_rows_axis_ticks(subset = "Isabella", color = "royalblue3") +
geom_rows_vector(color = "darkred") +
geom_cols_point(alpha = .5, color = "royalblue3") +
ggtitle(
"Logistic PCA of the Galapagos island finches",
"Islands (finches) scaled to the primary (secondary) axes"
) +
expand_limits(x = c(-30, 25))

Created on 2021-08-29 by the reprex package (v2.0.1)
Now to be super picky, it doesn't seem that there's a label_dodge parameter for geom_rows_text_radiate().
Unfortunately this solution (for character vectors) interferes with the internal calculation of group, so for urgency i dropped it in 0e41f530ffad93bcbee9eee9441571d85329522b (@jtr13 please take note, with an apology). A "pointer" would remedy it.
No apologies... thanks for the heads-up!