arrow icon indicating copy to clipboard operation
arrow copied to clipboard

GH-38916: [R] Simplify dataset and table print output

Open thisisnic opened this issue 2 years ago • 6 comments

Rationale for this change

When printing objects with data with lots of rows, the output is long and unwieldy.

What changes are included in this PR?

  • Truncates long schema print output and adds the number of columns to dataset print output.
  • Add number of columns to output so it's clear how many there are in total

Are these changes tested?

Yes

Are there any user-facing changes?

Yes

Before:

library(arrow)
x <- tibble::tibble(!!!letters, .rows = 5)
InMemoryDataset$create(x)
#> InMemoryDataset
#> "a": string
#> "b": string
#> "c": string
#> "d": string
#> "e": string
#> "f": string
#> "g": string
#> "h": string
#> "i": string
#> "j": string
#> "k": string
#> "l": string
#> "m": string
#> "n": string
#> "o": string
#> "p": string
#> "q": string
#> "r": string
#> "s": string
#> "t": string
#> "u": string
#> "v": string
#> "w": string
#> "x": string
#> "y": string
#> "z": string
arrow_table(x)
#> Table
#> 5 rows x 26 columns
#> $"a" <string>
#> $"b" <string>
#> $"c" <string>
#> $"d" <string>
#> $"e" <string>
#> $"f" <string>
#> $"g" <string>
#> $"h" <string>
#> $"i" <string>
#> $"j" <string>
#> $"k" <string>
#> $"l" <string>
#> $"m" <string>
#> $"n" <string>
#> $"o" <string>
#> $"p" <string>
#> $"q" <string>
#> $"r" <string>
#> $"s" <string>
#> $"t" <string>
#> $"u" <string>
#> $"v" <string>
#> $"w" <string>
#> $"x" <string>
#> $"y" <string>
#> $"z" <string>
record_batch(x)
#> RecordBatch
#> 5 rows x 26 columns
#> $"a" <string>
#> $"b" <string>
#> $"c" <string>
#> $"d" <string>
#> $"e" <string>
#> $"f" <string>
#> $"g" <string>
#> $"h" <string>
#> $"i" <string>
#> $"j" <string>
#> $"k" <string>
#> $"l" <string>
#> $"m" <string>
#> $"n" <string>
#> $"o" <string>
#> $"p" <string>
#> $"q" <string>
#> $"r" <string>
#> $"s" <string>
#> $"t" <string>
#> $"u" <string>
#> $"v" <string>
#> $"w" <string>
#> $"x" <string>
#> $"y" <string>
#> $"z" <string>

After:

library(arrow)

x <- tibble::tibble(!!!letters, .rows = 5)
InMemoryDataset$create(x)
#> InMemoryDataset
#> 26 columns 
#> "a": string
#> "b": string
#> "c": string
#> "d": string
#> "e": string
#> "f": string
#> "g": string
#> "h": string
#> "i": string
#> "j": string
#> "k": string
#> "l": string
#> "m": string
#> "n": string
#> "o": string
#> "p": string
#> "q": string
#> "r": string
#> "s": string
#> "t": string
#> ...
#> Use `schema()` to see entire schema
arrow_table(x)
#> Table
#> 5 rows x 26 columns
#> $"a" <string>
#> $"b" <string>
#> $"c" <string>
#> $"d" <string>
#> $"e" <string>
#> $"f" <string>
#> $"g" <string>
#> $"h" <string>
#> $"i" <string>
#> $"j" <string>
#> $"k" <string>
#> $"l" <string>
#> $"m" <string>
#> $"n" <string>
#> $"o" <string>
#> $"p" <string>
#> $"q" <string>
#> $"r" <string>
#> $"s" <string>
#> $"t" <string>
#> ...
#> Use `schema()` to see entire schema
record_batch(x)
#> RecordBatch
#> 5 rows x 26 columns
#> $"a" <string>
#> $"b" <string>
#> $"c" <string>
#> $"d" <string>
#> $"e" <string>
#> $"f" <string>
#> $"g" <string>
#> $"h" <string>
#> $"i" <string>
#> $"j" <string>
#> $"k" <string>
#> $"l" <string>
#> $"m" <string>
#> $"n" <string>
#> $"o" <string>
#> $"p" <string>
#> $"q" <string>
#> $"r" <string>
#> $"s" <string>
#> $"t" <string>
#> ...
#> Use `schema()` to see entire schema
  • Closes: #38916

thisisnic avatar Nov 28 '23 10:11 thisisnic

:warning: GitHub issue #38916 has been automatically assigned in GitHub to PR creator.

github-actions[bot] avatar Nov 28 '23 10:11 github-actions[bot]

I see this still has some tests failing...just making sure you weren't waiting on any of us for a review!

paleolimbot avatar Dec 05 '23 14:12 paleolimbot

I see this still has some tests failing...just making sure you weren't waiting on any of us for a review!

Nah, life has just been busy, but thanks for checking in on it. I'll ping you once I get time to take a look at it again!

thisisnic avatar Dec 07 '23 17:12 thisisnic

@paleolimbot This is now ready for review; mind giving this a look over when you get the chance?

thisisnic avatar Jan 10 '24 16:01 thisisnic

Thanks @thisisnic. I see this as a nice improvement and don't see any issues with it. I left a few notes.

amoeba avatar Mar 08 '24 21:03 amoeba

Thanks @jonkeane, I agree we should take a look at #32110 soon, but merge this as an interim thing in the meantime.

thisisnic avatar Mar 10 '24 15:03 thisisnic

After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit ac1708ce65e15a87fadec39e761731b3d916fb19.

There was 1 benchmark result with an error:

There was 1 benchmark result indicating a performance regression:

The full Conbench report has more details. It also includes information about 19 possible false positives for unstable benchmarks that are known to sometimes produce them.