DataFrames.jl
DataFrames.jl copied to clipboard
Improved REPL printing for GroupedDataFrames
This is a small enhancement to make GroupedDataFrame
s display so that they fit in one screen, like regular DataFrame
s with many rows.
Before, the first and last groups of a GroupedDataFrame
would be displayed at full size; now, the two groups are made smaller (using the :displaysize
parameter in IOContext
) so that together they fill the required line height.
If the first group is smaller than half the available REPL height, and the last group is larger, or vice versa, then the smaller group is shown in full and the larger one is squashed (displayed with missing rows). This means that the group with the most rows takes up the most room visually, even after some of its rows are skipped.
See the "GroupedDataFrame displaysize test"
test set for example output.
Makes sense. I will wait for @ronisbr to review first as he is maintaining all display stuff 😄.
@ronisbr - could you please review this PR (especially in terms of how it interacts with PrettyTables.jl kwargs) while we are waiting for HTML PR to be merged. Then I will review it. Thank you!
Sorry! I completely missed this one.
Hi @ronisbr, thanks for the comment!
I made sure the printing obeys “print squashed if h
is small but positive, but print everything if h
is invalid (zero or negative)” and added tests. (This seems to be how ungrouped data frames display.) Previously, h = 1
and h = 2
were edge cases where one group would be printed in full but not the other — good catch!
@Jollywatt - unfortunately due to Tables.jl 1.8 release we needed to make some changes in the internals of DataFrames.jl. Can you please rebase (or merge if rebasing is problematic) this PR against main
branch as otherwise CI will fail for this PR.
@bkamins I hope I did that right — I synced my fork using the Github UI, which made a merge, not a rebase.
merge should be OK.
In summary - the PR mostly looks good (it works OK under standard settings). However, please carefully consider :limit
setting of IOContext
and possible kwargs that can be passed to PrettyTables.jl so make sure we do not hit some problem in such cases.
Also documentation needs to be updated to reflect the changes rules of printing of GroupedDataFrame
.
From my side, I do not see problems related to kwargs in PrettyTables. The arguments allrows
and allcols
are converted correctly to the one in PrettyTables that limit the number of rows and columns to be printed. Of course, the user can break it badly by passing strange arguments to show
, but this is the side effect of the power to customize the output.
OK - so we can drop !allrows
check then. @Jollywatt - can you then please remove it + update documentation examples so that documenter passes correctly?
@nalimilan - do you have time to have a look at this PR (and as usual spot the tiny details that I usually miss; thank you!)
@Jollywatt - do you have time to have a look at this PR and finalize it. Thank you!
Soon we will make a feature freeze for DataFrames.jl 1.4 release (after this - the PR would still be merged but would be released later).
@bkamins Ok, done (I think)! Let me know if there’s anything else to do:)
tests are failing
I’ve been trying to debug those failures but haven’t gotten to the bottom of them. It seems like PrettyTables
got updated?
docs build is failing.
Regarding failing CI:
@ronisbr - this is a bug in PrettyTables.jl. You use @inbounds
in several places in the code, so this does not get detected in normal usage, but tests catch it (as they force bounds checking).
The offending operation is:
num_lines_in_row[1:num_header_rows]
where in the test num_header_rows
is 2, while num_lines_in_row
is 1.
This is in _count_header_lines
function.
The issue is most likely some non standard table printing heights that get passed in this PR to PrettyTables.jl.
So the fix should have two parts:
- personally - I would remove
@inbounds
here as it seems your code does not ensure that@inbounds
is safe (so it should not be used; also - most likely it does not save much time - but you will know better); (in general please consider in whole PrettyTables.jl if you indeed need to use@inbounds
in places where you use them) - probably some more checking of passed arguments needs to be done in PrettyTables.jl and they should be corrected if they are e.g. too restrictive
Oops, sorry about that! I always assumed that the header is always rendered. However, given the huge rewriting of PrettyTables in v2, I forgot to add some checks. In this case, the number of rendered lines was set to the number of display lines (1), which is lower than the number of header lines (2). This must never happen. I fixed this bug in master
and will release a new minor version as soon as the test finishes.
@Jollywatt by the way, the output will change. Notice that the header will always be printed, no matter how small the display is.
I think it is OK that the header is always printed.
I think it is OK that the header is always printed.
It is the solution that leads to the least amount of problems. Otherwise, the omitted cell summary would be "wrong" since we were also omitting header cells. Anyway, it only affects when you are using a very small terminal, which should be a very corner case.
Done! PrettyTables v2.1.1 is released. Can you please test it again?
Testing now (probably docstrings still need to be fixed, but at least we will see if the problem you addressed is fixed)
@Jollywatt - after PrettyTables.jl changes docstrings + testset outputs need to be updated. Thank you! After this I think all is good to merge.
@bkamins I’ve fixed the display logic and tests to work for PrettyTables 2.1.1, and fixed the doctests. Hopefully I haven’t missed anything:) (Apologies for the highly nonlinear git history…!)
Thank you! I need to review the PR before merging from scratch anyway so it is not a problem that history is complex.
Thank you!