DataFrames.jl icon indicating copy to clipboard operation
DataFrames.jl copied to clipboard

Improved REPL printing for GroupedDataFrames

Open Jollywatt opened this issue 1 year ago • 12 comments

This is a small enhancement to make GroupedDataFrames display so that they fit in one screen, like regular DataFrames with many rows.

Before, the first and last groups of a GroupedDataFrame would be displayed at full size; now, the two groups are made smaller (using the :displaysize parameter in IOContext) so that together they fill the required line height.

If the first group is smaller than half the available REPL height, and the last group is larger, or vice versa, then the smaller group is shown in full and the larger one is squashed (displayed with missing rows). This means that the group with the most rows takes up the most room visually, even after some of its rows are skipped.

See the "GroupedDataFrame displaysize test" test set for example output.

Jollywatt avatar Jul 28 '22 12:07 Jollywatt

Makes sense. I will wait for @ronisbr to review first as he is maintaining all display stuff 😄.

bkamins avatar Jul 28 '22 12:07 bkamins

@ronisbr - could you please review this PR (especially in terms of how it interacts with PrettyTables.jl kwargs) while we are waiting for HTML PR to be merged. Then I will review it. Thank you!

bkamins avatar Sep 16 '22 16:09 bkamins

Sorry! I completely missed this one.

ronisbr avatar Sep 16 '22 16:09 ronisbr

Hi @ronisbr, thanks for the comment! I made sure the printing obeys “print squashed if h is small but positive, but print everything if h is invalid (zero or negative)” and added tests. (This seems to be how ungrouped data frames display.) Previously, h = 1 and h = 2 were edge cases where one group would be printed in full but not the other — good catch!

Jollywatt avatar Sep 16 '22 22:09 Jollywatt

@Jollywatt - unfortunately due to Tables.jl 1.8 release we needed to make some changes in the internals of DataFrames.jl. Can you please rebase (or merge if rebasing is problematic) this PR against main branch as otherwise CI will fail for this PR.

bkamins avatar Sep 18 '22 08:09 bkamins

@bkamins I hope I did that right — I synced my fork using the Github UI, which made a merge, not a rebase.

Jollywatt avatar Sep 18 '22 08:09 Jollywatt

merge should be OK.

bkamins avatar Sep 18 '22 08:09 bkamins

In summary - the PR mostly looks good (it works OK under standard settings). However, please carefully consider :limit setting of IOContext and possible kwargs that can be passed to PrettyTables.jl so make sure we do not hit some problem in such cases.

bkamins avatar Sep 18 '22 12:09 bkamins

Also documentation needs to be updated to reflect the changes rules of printing of GroupedDataFrame.

bkamins avatar Sep 20 '22 07:09 bkamins

From my side, I do not see problems related to kwargs in PrettyTables. The arguments allrows and allcols are converted correctly to the one in PrettyTables that limit the number of rows and columns to be printed. Of course, the user can break it badly by passing strange arguments to show, but this is the side effect of the power to customize the output.

ronisbr avatar Sep 20 '22 16:09 ronisbr

OK - so we can drop !allrows check then. @Jollywatt - can you then please remove it + update documentation examples so that documenter passes correctly?

bkamins avatar Sep 20 '22 18:09 bkamins

@nalimilan - do you have time to have a look at this PR (and as usual spot the tiny details that I usually miss; thank you!)

bkamins avatar Sep 21 '22 09:09 bkamins

@Jollywatt - do you have time to have a look at this PR and finalize it. Thank you!

Soon we will make a feature freeze for DataFrames.jl 1.4 release (after this - the PR would still be merged but would be released later).

bkamins avatar Sep 24 '22 21:09 bkamins

@bkamins Ok, done (I think)! Let me know if there’s anything else to do:)

Jollywatt avatar Sep 24 '22 22:09 Jollywatt

tests are failing

I’ve been trying to debug those failures but haven’t gotten to the bottom of them. It seems like PrettyTables got updated?

Jollywatt avatar Sep 26 '22 00:09 Jollywatt

docs build is failing.

bkamins avatar Sep 26 '22 06:09 bkamins

Regarding failing CI:

@ronisbr - this is a bug in PrettyTables.jl. You use @inbounds in several places in the code, so this does not get detected in normal usage, but tests catch it (as they force bounds checking).

The offending operation is:

num_lines_in_row[1:num_header_rows]

where in the test num_header_rows is 2, while num_lines_in_row is 1.

This is in _count_header_lines function.

The issue is most likely some non standard table printing heights that get passed in this PR to PrettyTables.jl.

So the fix should have two parts:

  • personally - I would remove @inbounds here as it seems your code does not ensure that @inbounds is safe (so it should not be used; also - most likely it does not save much time - but you will know better); (in general please consider in whole PrettyTables.jl if you indeed need to use @inbounds in places where you use them)
  • probably some more checking of passed arguments needs to be done in PrettyTables.jl and they should be corrected if they are e.g. too restrictive

bkamins avatar Sep 26 '22 07:09 bkamins

Oops, sorry about that! I always assumed that the header is always rendered. However, given the huge rewriting of PrettyTables in v2, I forgot to add some checks. In this case, the number of rendered lines was set to the number of display lines (1), which is lower than the number of header lines (2). This must never happen. I fixed this bug in master and will release a new minor version as soon as the test finishes.

ronisbr avatar Sep 26 '22 14:09 ronisbr

@Jollywatt by the way, the output will change. Notice that the header will always be printed, no matter how small the display is.

ronisbr avatar Sep 26 '22 14:09 ronisbr

I think it is OK that the header is always printed.

bkamins avatar Sep 26 '22 14:09 bkamins

I think it is OK that the header is always printed.

It is the solution that leads to the least amount of problems. Otherwise, the omitted cell summary would be "wrong" since we were also omitting header cells. Anyway, it only affects when you are using a very small terminal, which should be a very corner case.

ronisbr avatar Sep 26 '22 14:09 ronisbr

Done! PrettyTables v2.1.1 is released. Can you please test it again?

ronisbr avatar Sep 26 '22 14:09 ronisbr

Testing now (probably docstrings still need to be fixed, but at least we will see if the problem you addressed is fixed)

bkamins avatar Sep 26 '22 14:09 bkamins

@Jollywatt - after PrettyTables.jl changes docstrings + testset outputs need to be updated. Thank you! After this I think all is good to merge.

bkamins avatar Sep 26 '22 17:09 bkamins

@bkamins I’ve fixed the display logic and tests to work for PrettyTables 2.1.1, and fixed the doctests. Hopefully I haven’t missed anything:) (Apologies for the highly nonlinear git history…!)

Jollywatt avatar Sep 29 '22 07:09 Jollywatt

Thank you! I need to review the PR before merging from scratch anyway so it is not a problem that history is complex.

bkamins avatar Sep 29 '22 07:09 bkamins

Thank you!

bkamins avatar Sep 29 '22 10:09 bkamins