report
report copied to clipboard
Report matrices and arrays
Current behavior
library(report)
report(WorldPhones)
#> x: n = 49, n = 49, n = 49, n = 49, n = 49, n = 49, n = 49, Mean = 16434.76, SD
#> = 24026.33, Median = 3000.00, MAD = 3097.15, range: [89, 79831], Skewness =
#> -0.99, Kurtosis = 1.13, 0 missing
Created on 2022-08-22 by the reprex package (v2.0.1)
Describe the solution you'd like
I'd like to see an argument to specify rows/columns (or maybe margin
as in apply()
?). This could default to columns to yield column summaries as in summary(WorldPhones)
.
How could we do it?
Implement report.matrix()
and maybe report.array()
leveraging existing report.character()
, report.numeric()
, etc. I would be willing to contribute an initial implementation.
Sure, this can basically amount to:
x <- as.data.frame(x)
report(x)
Thanks, @11rchitwood, for the idea, and for offering to add initial implementation.
Like @bwiernik mentioned, this should be as simple as first converting the data structure to a data frame in the respective S3
method, and then calling report()
on it. Would you like to make a PR?
library(report)
# matrix -----------------------
m <- WorldPhones
m <- as.data.frame(m)
report(m)
#> The data contains 7 observations of the following 7 variables:
#>
#> - N.Amer: n = 7, Mean = 66747.57, SD = 11277.46, Median = 68484.00, MAD =
#> 11196.60, range: [45939, 79831], Skewness = -0.99, Kurtosis = 1.13, 0 missing
#> - Europe: n = 7, Mean = 34343.43, SD = 7195.62, Median = 35218.00, MAD =
#> 7595.36, range: [21574, 43173], Skewness = -0.77, Kurtosis = 0.60, 0 missing
#> - Asia: n = 7, Mean = 6229.29, SD = 2124.21, Median = 6662.00, MAD = 2309.89,
#> range: [2876, 9053], Skewness = -0.28, Kurtosis = -0.53, 0 missing
#> - S.Amer: n = 7, Mean = 2772.29, SD = 496.69, Median = 2845.00, MAD = 410.68,
#> range: [1815, 3338], Skewness = -1.22, Kurtosis = 2.01, 0 missing
#> - Oceania: n = 7, Mean = 2625.00, SD = 523.06, Median = 2691.00, MAD = 481.84,
#> range: [1646, 3224], Skewness = -1.06, Kurtosis = 1.39, 0 missing
#> - Africa: n = 7, Mean = 1484.00, SD = 647.71, Median = 1663.00, MAD = 358.79,
#> range: [89, 2005], Skewness = -2.12, Kurtosis = 4.94, 0 missing
#> - Mid.Amer: n = 7, Mean = 841.71, SD = 176.12, Median = 836.00, MAD = 152.71,
#> range: [555, 1076], Skewness = -0.32, Kurtosis = -0.20, 0 missing
# array -----------------------
a <- as.array(letters)
a <- as.data.frame(a)
report(a)
#> x: 26 entries, such as a (n = 1); b (n = 1); c (n = 1) and 23 others (0 missing)
Created on 2022-08-22 with reprex v2.0.2
I don't know why I didn't think of coercing using as.data.frame()
. Opened PR #274.
from @DominiqueMakowski I'm not entirely convinced about that: matrices are originally quite different conceptually from dataframes, in that they contain info of the same type. So I'm not sure it's appropriate to report them like a data frame, column by column.
Take a correlation matrix, or a pairwise distance matrix... you could be interested in the average correlation/distance, the range and its global distribution, but it doesn't make much sense to describe them column by column.
I'd tend to say: matrices should be kept as matrices and reported globally by their type (i.e. as numeric if they are numeric), and if the matrix is actually a dataframe (since some R functions return matrices), then we could leave it to the user to convert it explicitly, since it would be the right thing to do anyway
what do you think?
A basic readout could be:
A numeric matrix with 20 rows and 5 columns.
A numeric array of dimension 20 x 5 x 3
We could have flags for some common types, like correlation, covariance, distance, transformation, and posterior draws, where we process them further or report additional details, such as average correlations, average correlations by column, or discriminants and eigenvalues.
I'm gonna close this one because, at least for my use case, the best answer is:
library(report)
report::report(as.data.frame(WorldPhones)