report icon indicating copy to clipboard operation
report copied to clipboard

Report matrices and arrays

Open 11rchitwood opened this issue 2 years ago • 4 comments

Current behavior

library(report)
report(WorldPhones)
#> x: n = 49, n = 49, n = 49, n = 49, n = 49, n = 49, n = 49, Mean = 16434.76, SD
#> = 24026.33, Median = 3000.00, MAD = 3097.15, range: [89, 79831], Skewness =
#> -0.99, Kurtosis = 1.13, 0 missing

Created on 2022-08-22 by the reprex package (v2.0.1)

Describe the solution you'd like

I'd like to see an argument to specify rows/columns (or maybe margin as in apply()?). This could default to columns to yield column summaries as in summary(WorldPhones).

How could we do it?

Implement report.matrix() and maybe report.array() leveraging existing report.character(), report.numeric(), etc. I would be willing to contribute an initial implementation.

11rchitwood avatar Aug 22 '22 14:08 11rchitwood

Sure, this can basically amount to:

x <- as.data.frame(x)
report(x)

bwiernik avatar Aug 22 '22 15:08 bwiernik

Thanks, @11rchitwood, for the idea, and for offering to add initial implementation.

Like @bwiernik mentioned, this should be as simple as first converting the data structure to a data frame in the respective S3 method, and then calling report() on it. Would you like to make a PR?

library(report)

# matrix -----------------------

m <- WorldPhones

m <- as.data.frame(m)
report(m)
#> The data contains 7 observations of the following 7 variables:
#> 
#>   - N.Amer: n = 7, Mean = 66747.57, SD = 11277.46, Median = 68484.00, MAD =
#> 11196.60, range: [45939, 79831], Skewness = -0.99, Kurtosis = 1.13, 0 missing
#>   - Europe: n = 7, Mean = 34343.43, SD = 7195.62, Median = 35218.00, MAD =
#> 7595.36, range: [21574, 43173], Skewness = -0.77, Kurtosis = 0.60, 0 missing
#>   - Asia: n = 7, Mean = 6229.29, SD = 2124.21, Median = 6662.00, MAD = 2309.89,
#> range: [2876, 9053], Skewness = -0.28, Kurtosis = -0.53, 0 missing
#>   - S.Amer: n = 7, Mean = 2772.29, SD = 496.69, Median = 2845.00, MAD = 410.68,
#> range: [1815, 3338], Skewness = -1.22, Kurtosis = 2.01, 0 missing
#>   - Oceania: n = 7, Mean = 2625.00, SD = 523.06, Median = 2691.00, MAD = 481.84,
#> range: [1646, 3224], Skewness = -1.06, Kurtosis = 1.39, 0 missing
#>   - Africa: n = 7, Mean = 1484.00, SD = 647.71, Median = 1663.00, MAD = 358.79,
#> range: [89, 2005], Skewness = -2.12, Kurtosis = 4.94, 0 missing
#>   - Mid.Amer: n = 7, Mean = 841.71, SD = 176.12, Median = 836.00, MAD = 152.71,
#> range: [555, 1076], Skewness = -0.32, Kurtosis = -0.20, 0 missing

# array -----------------------

a <- as.array(letters)

a <- as.data.frame(a)
report(a)
#> x: 26 entries, such as a (n = 1); b (n = 1); c (n = 1) and 23 others (0 missing)

Created on 2022-08-22 with reprex v2.0.2

IndrajeetPatil avatar Aug 22 '22 16:08 IndrajeetPatil

I don't know why I didn't think of coercing using as.data.frame(). Opened PR #274.

11rchitwood avatar Aug 22 '22 19:08 11rchitwood

from @DominiqueMakowski I'm not entirely convinced about that: matrices are originally quite different conceptually from dataframes, in that they contain info of the same type. So I'm not sure it's appropriate to report them like a data frame, column by column.

Take a correlation matrix, or a pairwise distance matrix... you could be interested in the average correlation/distance, the range and its global distribution, but it doesn't make much sense to describe them column by column.

I'd tend to say: matrices should be kept as matrices and reported globally by their type (i.e. as numeric if they are numeric), and if the matrix is actually a dataframe (since some R functions return matrices), then we could leave it to the user to convert it explicitly, since it would be the right thing to do anyway

what do you think?

A basic readout could be:

A numeric matrix with 20 rows and 5 columns.

A numeric array of dimension 20 x 5 x 3

We could have flags for some common types, like correlation, covariance, distance, transformation, and posterior draws, where we process them further or report additional details, such as average correlations, average correlations by column, or discriminants and eigenvalues.

bwiernik avatar Aug 23 '22 18:08 bwiernik

I'm gonna close this one because, at least for my use case, the best answer is:

library(report)
report::report(as.data.frame(WorldPhones)

11rchitwood avatar Oct 25 '22 14:10 11rchitwood