DataFrames.jl icon indicating copy to clipboard operation
DataFrames.jl copied to clipboard

printing does not respect :compact option

Open Roger-luo opened this issue 4 years ago • 16 comments

Currently compact printing only removes the last line break

julia> using DataFrames

julia> df = DataFrame(A = 1:4, B = ["M", "F", "F", "M"])
4×2 DataFrame
 Row │ A      B
     │ Int64  String
─────┼───────────────
   1 │     1  M
   2 │     2  F
   3 │     3  F
   4 │     4  M

julia> show(IOContext(stdout, :compact=>true), df)
4×2 DataFrame
 Row │ A      B
     │ Int64  String
─────┼───────────────
   1 │     1  M
   2 │     2  F
   3 │     3  F
   4 │     4  M

however, the convention should be

IOContext(io::IO, KV::Pair...)

  Create an IOContext that wraps a given stream, adding the specified key=>value pairs to the
  properties of that stream (note that io can itself be an IOContext).

    •  use (key => value) in io to see if this particular combination is in the properties set

    •  use get(io, key, default) to retrieve the most recent value for a particular key

  The following properties are in common use:

    •  :compact: Boolean specifying that values should be printed more compactly, e.g. that
       numbers should be printed with fewer digits. This is set when printing array elements.
       :compact output should not contain line breaks.

Roger-luo avatar Jun 23 '21 21:06 Roger-luo

Could you please comment what output you would expect instead in your case? I assume that you mean this part:

:compact output should not contain line breaks.

I think that @ronisbr does not support it as it is hard to imagine how output without line breaks could be useful and, in general, how it should look like, but maybe I am missing something here.

bkamins avatar Jun 23 '21 22:06 bkamins

As an additional comment - note that you can use MIME"text/csv" and MIME"text/tab-separated-values" for more compact display (although they would still contain line breaks)

bkamins avatar Jun 23 '21 22:06 bkamins

Could you please comment what output you would expect instead in your case? I assume that you mean this part:

Yes, sorry I should have mention that, I think the :compact property in Julia ecosystem usually has this nice property that it is printed as one line and can be copy-pasted, while the multi-line version does the richest printing.

In my case, when I define a custom struct that has a member is a DataFrame, the printing would not be parsable anymore in compact mode (but limit=false), e.g

julia> struct Foo
       x
       end

julia> Foo(df)
Foo(4×2 DataFrame
 Row │ A      B
     │ Int64  String
─────┼───────────────
   1 │     1  M
   2 │     2  F
   3 │     3  F
   4 │     4  M)

thus I mainly hope the data frame compact mode can be:

  1. contain no line break, which is consistent to other things
  2. parsable, so that one can easily copy paste (e.g after manipulating someone else data frame)

edit: I think at least the current compact mode doesn't seems to be different from normal printing so perhaps it makes sense to have a more distinct print style.

Roger-luo avatar Jun 23 '21 23:06 Roger-luo

@ronisbr - it makes sense. Also see that:

julia> df = DataFrame(a=[1,2], b=[3,4])
2×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      3
   2 │     2      4

julia> repr(df)
"2×2 DataFrame\n Row │ a      b\n     │ Int64  Int64\n─────┼──────────────\n   1 │     1      3\n   2 │     2      4"

which violates the objective of repr, as it should be:

"DataFrame(\"a\" => [1, 2], \"b\" => [3, 4])"

Given the flexibility of Pair constructor we have it should be a fully good solution (assuming that underlying vectors support repr properly). What do you think. Can something like this be added to PrettyTables.jl? (and do you think it makes sense adding this?)

bkamins avatar Jun 24 '21 06:06 bkamins

Hi @bkamins

What do you think. Can something like this be added to PrettyTables.jl? (and do you think it makes sense adding this?)

No, this feature is not something that can be implemented in PrettyTables.jl since it highly depends on the type itself. The option :compact is used to print the data more compact, like:

julia> df = DataFrame(a = float(pi), b = exp(1));

julia> show(IOContext(stdout, :compact=>true), df)
1×2 DataFrame
 Row │ a        b
     │ Float64  Float64
─────┼──────────────────
   1 │ 3.14159  2.71828

julia> show(IOContext(stdout, :compact=>false), df)
1×2 DataFrame
 Row │ a                  b
     │ Float64            Float64
─────┼──────────────────────────────────────
   1 │ 3.141592653589793  2.718281828459045

Notice that this mimics Base:

julia> show(IOContext(stdout, :compact=>true), [float(pi), exp(1)])
[3.14159, 2.71828]

julia> show(IOContext(stdout, :compact=>false), [float(pi), exp(1)])
[3.141592653589793, 2.718281828459045]

To implement this feature, DataFrames.jl would have to check if compact printing is needed and then print the representation in one line instead of calling PrettyTables.jl to render the table itself.

I do this in my types by overloading two types of show:

function show(io::IO, tle::TLE)

function show(io::IO, mime::MIME"text/plain", tle::TLE)

ronisbr avatar Jun 24 '21 13:06 ronisbr

OK - then at some point we will handle it in DataFrames.jl internally. Thank you!

bkamins avatar Jun 24 '21 15:06 bkamins

Yes, it will be better. I can do this if we define how the compact output can be created.

ronisbr avatar Jun 24 '21 15:06 ronisbr

I can do this if we define how the compact output can be created.

Oh - that would be great. It should be created the way I have shown above:

"DataFrame(\"a\" => [1, 2], \"b\" => [3, 4])"

so you:

  1. Print DataFrame(
  2. then for each column
    • print column name in double quotes (properly escaped)
    • print =>
    • print compact representation of the vector contained in the column
    • print ,
  3. then print )

bkamins avatar Jun 24 '21 15:06 bkamins

OK, but when it should trigger this "compact" printing? Should I use this approach that overloads two shows? Notice that using :compact option does not seem right. The following text is from the documentation of IOContext:

    •  :compact: Boolean specifying that values should be printed more
       compactly, e.g. that numbers should be printed with fewer digits.
       This is set when printing array elements. :compact output should
       not contain line breaks.

ronisbr avatar Jun 24 '21 16:06 ronisbr

My understanding was that :compact preferably does

:compact output should not contain line breaks.

per your quote, and this is what I propose in my proposal. Where do you see a problem? (BTW: I am not the expert in IOContext issues, so I am just commenting based on what I read in docstrings)

bkamins avatar Jun 24 '21 18:06 bkamins

What if the user wants an output with fewer digits but printed as a table?

ronisbr avatar Jun 24 '21 18:06 ronisbr

I am not sure 😞. As commented - I am not an expert here. But aren't we already doing :compact on float columns by default? Given the docstring:

This is set when printing array elements.

so it seems it should be the default in this case anyway.

bkamins avatar Jun 24 '21 18:06 bkamins

What if the user wants an output with fewer digits but printed as a table?

I'm not sure either... it seems :compact controls two things: fewer digits and no line breaks, I sometimes do feel there should be a 3rd option :inline to indicate the user only want a no line breaks printing. But since currently the default plot is already very similar to the printing when :compact enabled, so perhaps makes sense to do a more "compact" printing?

Roger-luo avatar Jun 24 '21 19:06 Roger-luo

The mention that :compact output should not include line breaks has been added recently by https://github.com/JuliaLang/julia/pull/36076. See discussion at https://github.com/JuliaLang/julia/issues/36072. TBH the printing system is quite messy. The summary given by Jeff on the PR is this: "All you really need to know is "MIME type = human readable, 2-arg = parseable, compact = keep it short". See also https://github.com/JuliaLang/julia/issues/40030. Given that it's almost impossible to print the contents of a data frame inside a container, I wonder whether we should just print R×C DataFrame when :compact => true is passed. We already use that special printing for DataFrames inside DataFrames (for technical reasons IIRC).

Regarding repr, AFAIK the standard solution is to define the two-argument show. repr should never be overloaded. If we want to follow the recommendation that repr and two-argument show give parseable output, we should change the latter. But then the downside is that things like show(df, allrows=true) won't give the most commonly expected output, and people will have to write show(stdout, MIME("text/plain"), df, allrows=true) (or show(stdout, "text/plain", df, allrows=true) if we add a convenience method).

nalimilan avatar Jul 07 '21 09:07 nalimilan

for technical reasons IIRC

To having to handle circular references

Regarding repr - I think we just should follow the convention as @nalimilan suggests (as I have commented - I am not an expert here)

bkamins avatar Jul 07 '21 11:07 bkamins