DataFrames.jl icon indicating copy to clipboard operation
DataFrames.jl copied to clipboard

Use PrettyTables.jl as HTML backend

Open ronisbr opened this issue 2 years ago • 95 comments

Hi!

I am submitting the initial version of the code that makes DataFrames.jl uses PrettyTables.jl as HTML backend.

I need features that are not released in PrettyTables.jl yet. Hence, I propose the following workflow:

  1. We discuss and adjust everything we need in this PR.
  2. Once we done, I tag the new version of PrettyTables.jl that already has breaking changes.
  3. I update this PR to make DataFrames use the new version of PrettyTables.jl.
  4. Only here we can merge this work.

ronisbr avatar Jul 02 '22 22:07 ronisbr

I will post some screenshots obtained in Jupiter. It would be nice if someone can pass interesting cases to check if I missed something:

Captura de Tela 2022-07-02 às 19 24 21 Captura de Tela 2022-07-02 às 19 25 48

The alignment follows the same behavior of text output without the . alignment with numbers since it is not possible:

  1. Numbers are aligned to the right.
  2. Everything else (including the titles) are aligned to the left.
Captura de Tela 2022-07-02 às 19 31 38

The only part I am really not happy with is the header cell title. DataFrames uses the title option of HTML table to output the type without cropping. I needed to modify PrettyTables so that it can be add this option. However, I am not liking the solution. I will think if this can be improved.

Another thing we must discuss is how we will select the maximum number of rows and columns that will be rendered. @bkamins mentioned about using a env variable. In this PR, I just copied what we had before.

ronisbr avatar Jul 02 '22 22:07 ronisbr

@ronisbr - to test this I understand we should:

  1. checkout master of PrettyTables.jl
  2. checkout ronisbr:html_backend of DataFrames.jl

?

bkamins avatar Jul 03 '22 16:07 bkamins

Hi @bkamins !

Yes! Everything should work after those two steps.

ronisbr avatar Jul 03 '22 16:07 ronisbr

By the way, I was thinking in changing the maximum number of rows and columns to get the values in ENV["DATAFRAMES_COLUMNS"] and ENV["DATAFRAMES_ROWS"]. What do you think?

ronisbr avatar Jul 03 '22 16:07 ronisbr

I was thinking in changing the maximum number of rows and columns to get the values in ENV["DATAFRAMES_COLUMNS"] and ENV["DATAFRAMES_ROWS"]. What do you think?

This is exactly my thinking. And I would set some reasonable defaults when they are not present, but let us wait for @nalimilan to comment on this.

bkamins avatar Jul 03 '22 17:07 bkamins

Ok! I can do this in this PR right now just to make it easier to test the cropping (it is somewhat different from what we have today and it was what lead to the most problems when adding features to PrettyTables :D). If we decided a different solution, I can remove it.

ronisbr avatar Jul 03 '22 17:07 ronisbr

OK

bkamins avatar Jul 03 '22 17:07 bkamins

This is related to discussion at https://github.com/JuliaLang/IJulia.jl/pull/1040. Currently we use displaysize to get the maximum size: any particular reason to change this? IJulia sets the COLUMNS environment variable, which is used by displaysize as a fallback, so if we use something else we won't respect that setting, right?

nalimilan avatar Jul 03 '22 21:07 nalimilan

so if we use something else we won't respect that setting, right?

Right. The reason is as follows:

  1. default COLUMNS is too small for DataFrames.jl.
  2. increasing default COLUMNS is too large for other objects (matrices are one of the important problematic cases).

So, since DataFrames.jl is lower in precedence than Base Julia we cannot change the default COLUMNS but need to use something else. Note that currently DataFrames.jl users almost always change COLUMNS in Jupyter Notebook, because the default is not useful, but this destroys display of other objects.

bkamins avatar Jul 03 '22 21:07 bkamins

One important comment: I could not test in Pluto! It seems that when we add data-frame class to the table and div it changes completely the output. However, the good side is that the new system seems to work without breaking :)

ronisbr avatar Jul 03 '22 23:07 ronisbr

I wondered why I was taking almost 4 minutes to show the first DataFrame. However, it turns out the be a problem in Jupyter. It also happens with the old version. The interesting point is that this code:

df = DataFrame(rand(10, 10), :auto);
display(df)

takes almost 4s to run the first time. Whereas this code:

df = DataFrame(rand(10, 10), :auto);
show(stdout, MIME("text/html"), df)

takes less than 2s.

Given this, I am happy with the current state inside PrettyTables.jll, and I think I can remove the draft from this PR!

Of course I need to update the tests, but we need to decide somethings before that.

ronisbr avatar Jul 04 '22 12:07 ronisbr

What feels weird to me is that data frames and matrices are very similar objects when it comes to printing. In theory you could also scroll horizontally if the matrix doesn't fit the screen, and it would arguably be more useful than plain truncation. This probably doesn't apply to all objects, but it sounds like a more general issue than just tables. Maybe it should be called something like MAX_SCROLL_COLUMNS or MAX_COLUMNS?

nalimilan avatar Jul 05 '22 07:07 nalimilan

What feels weird to me is that data frames and matrices are very similar objects when it comes to printing.

They conceptually are, but they are not in implementation, as matrices do not support "text/html" MIME.

Regarding the name then indeed something like SCROLL_COLUMNS would make sense (I think MAX can be dropped as COLUMNS does not have this prefix).

bkamins avatar Jul 05 '22 09:07 bkamins

I could not test in Pluto.jl

You mean that this PR fails under Pluto.jl? I understand this is a general PrettyTables.jl issue - right?

bkamins avatar Jul 05 '22 09:07 bkamins

They conceptually are, but they are not in implementation, as matrices do not support "text/html" MIME.

Yeah, but shouldn't such a method exist ideally?

nalimilan avatar Jul 05 '22 09:07 nalimilan

But it should be in Base Julia (same for LaTeX)

bkamins avatar Jul 05 '22 09:07 bkamins

You mean that this PR fails under Pluto.jl? I understand this is a general PrettyTables.jl issue - right?

Actually no. Pluto.jl does some customization if the class in div and table is data-frame. Some of those customization overrides what we selected in this PR. It is not a bug, just a design choice from Pluto.jl's side.

ronisbr avatar Jul 05 '22 16:07 ronisbr

@fonsp - could you please comment how you see this (I it would be good to keep this PR and Pluto.jl in sync). Thank you!

bkamins avatar Jul 05 '22 16:07 bkamins

@ronisbr - https://github.com/fonsp/Pluto.jl/issues/2206 was tagged, but I get no response from Pluto.jl maintainers what we should do. Do you think that the current design is "robust enough" that we can go ahead (i.e. the display in Pluto will be different, but it still will be nice and readable) or should we wait (i.e. things in Pluto.jl will not be nice so it is better to synchronize them before releasing)?

CC @fonsp

bkamins avatar Jul 18 '22 06:07 bkamins

Hi @bkamins !

In my tests, I saw no problem. However, if you want, we can change the class name to something like data-frame-v2, leading to no customization at Pluto side.

ronisbr avatar Jul 18 '22 14:07 ronisbr

Maybe you have a print screen what are the consequences of the current customization in Pluto?

bkamins avatar Jul 18 '22 14:07 bkamins

Sure! I will post it tonight.

ronisbr avatar Jul 20 '22 18:07 ronisbr

This is the default view (notice that this cropping is not added by PrettyTables.jl):

Captura de Tela 2022-07-21 às 09 29 13

I think Pluto sets :limit to false. Hence, if we keeping pressing more it eventually shows all the DataFrame.

Another thing is that the column alignment is not the same as selected by PrettyTables.jl. We are aligning numbers to the right (to keep consistency with text backend), but here the are aligned to the left.

ronisbr avatar Jul 21 '22 12:07 ronisbr

So it seems Pluto.jl just has a custom way of showing things end-2-end, so probably we can just let them do what they want. I understand current settings (i.e. using the same class name as they use) achieves this?

bkamins avatar Jul 21 '22 12:07 bkamins

I understand current settings (i.e. using the same class name as they use) achieves this?

Yes, it seems it overrides everything. However, I will keep testing as we modified this PR to check if some new setting breaks Pluto.

ronisbr avatar Jul 21 '22 12:07 ronisbr

I am dropping here a lot of screenshots in Jupyter to make the review easier :)


Here is a simple test from the DataFrames' test set:

Captura de Tela 2022-07-23 às 10 35 57

One important thing is that the output (HTML code) is now much more verbose than before. Before we have:

<div class="data-frame">
  <p>2 rows × 2 columns</p>
  <table class="data-frame">
    <thead>
      <tr>
        <th></th>
        <th>Fish</th>
        <th>Mass</th>
      </tr>
      <tr>
        <th></th>
        <th title="String">String</th>
        <th title="Union{Missing, Float64}">Float64?</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <th>1</th>
        <td>Suzy</td>
        <td>1.5</td>
      </tr>
      <tr>
        <th>2</th>
        <td>Amir</td>
        <td><em>missing</em></td>
      </tr>
    </tbody>
  </table>
</div>

Now we have:

<div>
  <div style = "float: left;">
    <span>2×2 DataFrame</span>
  </div>
  <div style = "clear: both;">
  </div>
</div>
<div class = "data-frame" style = "overflow-x: scroll;">
  <table class = "data-frame">
    <thead>
      <tr class = "header">
        <th class = "rowNumber" style = "font-weight: bold; text-align: right;">Row</th>
        <th style = "text-align: left;">Fish</th>
        <th style = "text-align: left;">Mass</th>
      </tr>
      <tr class = "subheader headerLastRow">
        <th class = "rowNumber" style = "font-weight: bold; text-align: right;"></th>
        <th title = "String" style = "text-align: left;">String</th>
        <th title = "Union{Missing, Float64}" style = "text-align: left;">Float64?</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td class = "rowNumber" style = "font-weight: bold; text-align: right;">1</td>
        <td style = "text-align: left;">Suzy</td>
        <td style = "text-align: right;">1.5</td>
      </tr>
      <tr>
        <td class = "rowNumber" style = "font-weight: bold; text-align: right;">2</td>
        <td style = "text-align: left;">Amir</td>
        <td style = "font-style: italic; text-align: right;">missing</td>
      </tr>
    </tbody>
  </table>
</div>

Of course most of them are related to the new features.


Captura de Tela 2022-07-23 às 10 44 16
Captura de Tela 2022-07-23 às 10 44 46
Captura de Tela 2022-07-23 às 10 45 15
Captura de Tela 2022-07-23 às 10 45 48
Captura de Tela 2022-07-23 às 10 47 39
Captura de Tela 2022-07-23 às 10 49 08
Captura de Tela 2022-07-23 às 11 06 01
Captura de Tela 2022-07-23 às 11 06 59

ronisbr avatar Jul 23 '22 14:07 ronisbr

All looks good. Also the following cases would be interesting (to be sure that we have what we want expect):

  1. Bool display in Bool column and in Any column
  2. Symbol display
  3. Char display
  4. Vector (too wide to display)
  5. Matrix (too wide to display)
  6. Tuple
  7. NamedTuple
  8. wide string (and does it play well with column width option?)
  9. hard case:
df = DataFrame(x = Any[missing])
df[1, 1] = [df]
df

Also note that the last case requires fixing for text output:

julia> df = DataFrame(x = Any[missing])
1×1 DataFrame
 Row │ x
     │ Any
─────┼─────────
   1 │ missing

julia> df[1, 1] = [df]
1-element Vector{DataFrame}:
Error showing value of type Vector{DataFrame}:
ERROR: StackOverflowError:

bkamins avatar Jul 23 '22 16:07 bkamins

Hi @bkamins !

I will post each case here:

  1. Bool display in Bool column and in Any column
Captura de Tela 2022-07-23 às 13 57 04
  1. Symbol display
Captura de Tela 2022-07-23 às 13 59 22
  1. Char display
Captura de Tela 2022-07-23 às 13 57 48
  1. Vector (too wide to display)
Captura de Tela 2022-07-23 às 14 02 17
  1. Matrix (too wide to display)
Captura de Tela 2022-07-23 às 14 01 48
  1. Tuple
Captura de Tela 2022-07-23 às 14 03 03
  1. NamedTuple
Captura de Tela 2022-07-23 às 14 04 33

wide string (and does it play well with column width option?)

We do not have a column width option as in text backend. Should I implement an option called maximum_columns_width that will crop the column if the number of characters is larger than a threshold?

hard case:

Here we need to discuss. The only problem is when the element inside the vector is the same DataFrame. The usual solution is to call summary if the element is a AbstractArray{T, N} where {T <: DataFrame, N}. However it will change somethings. Today, we have:

julia> df1 = DataFrame(a = 1);

julia> df2 = DataFrame(a = 2);

julia> df3 = DataFrame(a = 3);

julia> df = DataFrame(a = [[df1, df2, df3]])
1×1 DataFrame
 Row │ a
     │ Array…
─────┼───────────────────────────────────
   1 │ DataFrame[\e[1m1×1 DataFrame\e[0…

If we change to what I suggest, we will have:

julia> df1 = DataFrame(a = 1);

julia> df2 = DataFrame(a = 2);

julia> df3 = DataFrame(a = 3);

julia> df = DataFrame(a = [[df1, df2, df3]])
1×1 DataFrame
 Row │ a
     │ Array…
─────┼─────────────────────────────
   1 │ 3-element Vector{DataFrame}
julia> df4 = DataFrame(a = 4);

julia> df = DataFrame(a = [[df1 df2; df3 df4]])
1×1 DataFrame
 Row │ a
     │ Array…
─────┼───────────────────────
   1 │ 2×2 Matrix{DataFrame}
julia> df = DataFrame(x = Any[missing])
1×1 DataFrame
 Row │ x
     │ Any
─────┼─────────
   1 │ missing

julia> df[1,1] = [df]
1-element Vector{DataFrame}:
 1×1 DataFrame
 Row │ x
     │ Any
─────┼─────────────────────────────
   1 │ 1-element Vector{DataFrame}

julia> df
1×1 DataFrame
 Row │ x
     │ Any
─────┼─────────────────────────────
   1 │ 1-element Vector{DataFrame}

I really prefer the latter. If you are ok with this, I can add a commit in this PR that will fix it in both text and HTML backend.

ronisbr avatar Jul 23 '22 17:07 ronisbr

Vector (too wide to display)

I meant this:

julia> DataFrame(a = [rand(20) for i in 1:3])
3×1 DataFrame
 Row │ a
     │ Array…
─────┼───────────────────────────────────
   1 │ [0.919621, 0.538415, 0.550435, 0…
   2 │ [0.068138, 0.291698, 0.333948, 0…
   3 │ [0.141876, 0.963646, 0.0408556, …

Matrix (too wide to display)

I meant this:

julia> DataFrame(a = [rand(20,20) for i in 1:3])
3×1 DataFrame
 Row │ a
     │ Array…
─────┼───────────────────────────────────
   1 │ [0.538992 0.61017 … 0.0729434 0.…
   2 │ [0.517444 0.489785 … 0.528586 0.…
   3 │ [0.0616212 0.767445 … 0.819681 0…

We do not have a column width option as in text backend.

So what do we do in the following case?

julia> DataFrame(id=1:3, str="a"^1000)
3×2 DataFrame
 Row │ id     str
     │ Int64  String
─────┼──────────────────────────────────────────
   1 │     1  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa…
   2 │     2  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa…
   3 │     3  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa…

if we do not have a limit this might be problematic. In text/plain we have truncate kwarg. Why wouldn't we use the same kwarg here (but maybe with a larger default)

Here we need to discuss. The only problem is when the element inside the vector is the same DataFrame.

Yes. This is the only problem. In general I prefer the latter, i.e. something like:

julia> df
1×1 DataFrame
 Row │ x
     │ Any
─────┼─────────────────────────────
   1 │ 1-element Vector{DataFrame}

as you have shown, but the question is if we can handle it correctly the following case:

df = DataFrame(x = Any[missing])
df[1, 1] = Any[df]
df

or the case:

df = DataFrame(x = Any[missing])
df[1, 1] = SomeFancyTypeWrappingDataFrame(df)
df

Base Julia handles this by cycle detection:

julia> x = Any[]
Any[]

julia> push!(x, x)
1-element Vector{Any}:
 1-element Vector{Any}:#= circular reference @-1 =#

However, it might be too hard to implement (maybe not?). A half-baked solution would be to catch StackOverflowException (apart from handling the simpler cases directly) and show some "generic message" informing that the value is too complex to display? (sorry for raising such cases, but I hope you - like me - have fun with solving such puzzles 😄)

bkamins avatar Jul 23 '22 17:07 bkamins

Vector (too wide to display) Matrix (too wide to display) So what do we do in the following case?

Oh sorry! :D In this case, I think using truncate as the number of printable characters is not good in HTML. This approach can lead to many problems. For example, in Markdown, I just cannot naively crop the string, since it can omit closing tags or, even worst, cut a tag in the middle. I would need to implement a way to count printable characters in HTML, which can be extremely hard.

To overcome this, I add a feature in PrettyTables (keyword: maximum_columns_width) that adds a set of styling options to the cells so that the HTML crops the string if the cell is larger than a predefined width. In our case, the truncate here will have the maximum size of the columns in pixels. Is this ok?

Take a look at the output:

Captura de Tela 2022-07-24 às 10 28 18 Captura de Tela 2022-07-24 às 10 29 24 Captura de Tela 2022-07-24 às 10 29 55

Regarding the circular reference problem, I need to think more. Wrapping the same DataFrame inside an Any[] will indeed be problematic. I still have no ideia how to solve this besides adding a try..catch. My concern is if this try..catch would not hurt a lot the performance.

(sorry for raising such cases, but I hope you - like me - have fun with solving such puzzles 😄)

There is no problem at all! :D I really like solving those problems.

ronisbr avatar Jul 24 '22 13:07 ronisbr

After thinking a lot, the only way I saw to solve the circular reference inside PrettyTables is:

  1. Create a global table that store the hash of every object that is printed.
  2. Before printing, add the hash to the table.
  3. If the hash exists, just print "CIRCULAR REFERENCE" or something.
  4. Remove the hash from the table when the printing ends.

However, I have never worked with global tables with hashes. I am a little afraid to go through that path :D Can we address this bug after the HTML is complete? The solution will work for both backends.

ronisbr avatar Jul 26 '22 00:07 ronisbr

The solution you propose is exactly how circular reference is detected in Base Julia. You do not need to use a global table and hashing (as hashing is not 100% reliable). Normally it is done via IOContext, have a look at:

  • https://github.com/JuliaLang/julia/blob/master/base/show.jl#L403 (circularity detection)
  • https://github.com/JuliaLang/julia/blob/master/base/show.jl#L450 (an example how it is used in the default implementation of show)

where you can see how it is handled (the idea is that IOContext keeps track of what was already displayed and you use === for checking the exact value you display; the === works because only mutable structs can be self referrential, i.e. even if you displayed 1 already you will not hit this path as 1 cannot have a reference to itself internally).

This approach automatically handles the fact that you need to update IOContext recursively. I hope it is clear - if not please comment.

Of course this can be handled separately later as it is a corner case.

bkamins avatar Jul 26 '22 06:07 bkamins

Looks really cool. Sorry for being greedy: do you think there would also be a way to align numbers on the decimal dot?

nalimilan avatar Jul 27 '22 11:07 nalimilan

Hi @bkamins and @nalimilan!

Sorry for the delay. For some reason, Github stopped sending me notifications to my email.

You do not need to use a global table and hashing (as hashing is not 100% reliable). Normally it is done via IOContext, have a look at:

Very interesting! Using IOContext to do this seems very plausible and easy inside PrettyTables. Let's see if I do not encounter some problem.

Looks really cool. Sorry for being greedy: do you think there would also be a way to align numbers on the decimal dot?

Thanks! Unfortunately I have not found any solid way to align numbers in HTML without modifying the font. If we force the numbers to be rendered in monospace fonts, then we can use the same algorithm in text backend. What do you think?

By the way, I am not sure if I like the suggestion to change the font to monospace in HTML.

ronisbr avatar Aug 01 '22 17:08 ronisbr

I am not sure if I like the suggestion to change the font to monospace in HTML.

I do not think we should use monospace font.

Using IOContext to do this seems very plausible and easy

I think it should be relatively easy to follow the pattern that Base Julia uses 😄 (hopefully).

bkamins avatar Aug 01 '22 18:08 bkamins

I agree that using a monospace font wouldn't be a good solution. I was basically wondering whether HTML supports aligning on the decimal separator. If that doesn't exist then better keep the current behavior.

nalimilan avatar Aug 01 '22 19:08 nalimilan

OK! So, what are the next steps? Should I update the tests?

ronisbr avatar Aug 03 '22 18:08 ronisbr

I would say so. After the tests are updated I will do a detailed review (as it is easier with tests present). I understand we leave circular reference case as to-do for now.

bkamins avatar Aug 03 '22 20:08 bkamins

Perfect! I will update the tests this weekend!

ronisbr avatar Aug 10 '22 12:08 ronisbr

Done! Sorry for the delay! I faced a very strange bug related with middle row cropping in text backend due to a modification I did during the adaptation of html backend in PrettyTables :D

Now, I updated all the tests.

PS: The tests here will fail because I need to tag a new version of PrettyTables.jl. However, I need to wait the discussion here since it might require a new feature or breaking change.

ronisbr avatar Aug 20 '22 22:08 ronisbr

some error is thrown - should be simple to fix

bkamins avatar Aug 21 '22 09:08 bkamins

Hi @bkamins !

Do you mean here? It will not pass because I need to tag a new version of PrettyTables. Is everything OK with this PR? Can I already release a new version?

ronisbr avatar Aug 21 '22 13:08 ronisbr

Ah - right. I have left some minor comments to this PR. But in general all looks good. Probably it would be good if @nalimilan had a quick look at this PR. Thank you!

bkamins avatar Aug 21 '22 15:08 bkamins

Done @bkamins ! Thanks for the review.

I also forgot to add the keyword truncate as we agreed before. Now, if it is passed with a value grater than 0, the cells are cropped if they need more than that value in pixels. The minor problem is that the HTML code is quite verbose here (see the tests), but it is transparent for the user. The result is:

Captura de Tela 2022-08-21 às 18 35 28

In this case, if the user hover the mouse over the cell, they can see the original value.

I propose the following strategy new:

  1. Wait the for comments of other reviewers.
  2. Tag the new version of PrettyTables.
  3. Update this commit to add support in DataFrames for the new version of PrettyTables (minor changes in text backend as well).

What do you think?

ronisbr avatar Aug 21 '22 21:08 ronisbr

Excellent. Let us do it the way you propose.

bkamins avatar Aug 21 '22 21:08 bkamins

Hi @bkamins !

From my side, all the changes I would like to implement to tag PrettyTables.jl v2.0 are ready. I think I will also be able to add support to OffsetArrays (this will change nothing from DataFrames side, but PrettyTables users will be able to print OffsetArrays). I think I will commit those changes until Saturday.

I pushed some commits here to use the new API in PrettyTables v2.0.

ronisbr avatar Aug 31 '22 16:08 ronisbr

OK - thank you!

Maybe you could add a sentence or two to the documentation in this PR how to do pretty-printing of data frames in Jupyter Notebook (the issue we recently discussed in Discourse) - as this is something that is frequently requested. Thank you!

bkamins avatar Aug 31 '22 18:08 bkamins

Hi @bkamins ,

I am having some problems finding the best place to add information about how the output in Jupyter can be customized. Do you have any idea?

ronisbr avatar Sep 04 '22 17:09 ronisbr

It depends on how much time you are willing to invest into it:

  • low effort: Just please add this information to the note that you already change about specifying number of columns and rows in Jupyter Notebook;
  • bigger effort: remove the note and add it to a new section in "Getting Started" (as a last section) additionally explaining that for text/plain and HTML DataFrames.jl uses PrettyTables.jl and give some examples both for text/plain and HTML (and later we will add LaTeX to it).

This "bigger effort" option can be done later. I can do it after you merge this PR if you prefer (in that case can you please do the "low effort" option?).

bkamins avatar Sep 04 '22 17:09 bkamins

@bkamins Done! I did the "low effort". The section with more details I can do after we merge everything :)

ronisbr avatar Sep 04 '22 17:09 ronisbr

I just rebased everything with the new commits to DataFrames. From my side, I am ready to tag v2.0 of PrettyTables.

ronisbr avatar Sep 04 '22 17:09 ronisbr

PrettyTables v2.0.0 was released! Let's see if the tests pass now.

ronisbr avatar Sep 08 '22 01:09 ronisbr

Hi @bkamins !

I think I cannot support Julia 1.0 anymore. Is it a problem? That’s why the tests are failing in 1.0.

ronisbr avatar Sep 08 '22 04:09 ronisbr

Dropping support for Julia 1.0 is a significant change. Let us discuss it.

There are the following topics:

  1. Can you please explain why PrettyTables.jl cannot support Julia 1.0?
  2. PrettyTables.jl claims that it supports Julia 1.0 in its Project.toml https://github.com/ronisbr/PrettyTables.jl/blob/master/Project.toml#L23. So the information is incorrect there. It should be fixed and a proper lower bound set.

After this is settled, and depending on a correct lower bound:

  1. Pros:
    • User base of Julia 1.0 is limited
    • We will finally be able to use some more advanced features of Julia that we could not because of Julia 1.0 support
    • DataFrames.jl 1.3 is fully functional (i.e. it has a lot of features so users that need to use Julia 1.0 can live with DataFrames.jl 1.3 I think)
  2. Cons:
    • we drop some users probably.

CC @nalimilan @pdeffebach

bkamins avatar Sep 08 '22 07:09 bkamins

Hi @bkamins !

The problem is not in PrettyTables, but in https://github.com/ronisbr/StringManipulation.jl

The new version of PrettyTables.jl uses this package to perform some string-related manipulation. I cannot remember correctly, but there was a problem that did not allow me to support Julia 1.0 (I am almost sure it was something related with @kwdef).

I can try to revisit this if necessary, but I think trying to support anything older than the current LTS does not seem a correct decision. Julia is a relatively new language and bugs in 1.0 will not be fixed anymore in a new 1.0 release.

ronisbr avatar Sep 08 '22 11:09 ronisbr

I can try to revisit this if necessary,

We discussed it briefly with @nalimilan. Unfortunately there are important Julia users that are stuck with older Julia versions (mostly corporate users). Therefore, the question is how problematic it is to keep 1.0 support in StringManipulation.jl (in DataFrames.jl we were able to ensure this backward compatibility as you can see :)).

Of course if keeping Julia 1.0 support is super problematic then this is a different issue (note, however, that even Compat.jl keeps 4.x and 3.x branches in parallel to ensure Julia 1.0 support).

bkamins avatar Sep 08 '22 11:09 bkamins

Of course if keeping Julia 1.0 support is super problematic then this is a different issue (note, however, that even Compat.jl keeps 4.x and 3.x branches in parallel to ensure Julia 1.0 support).

I will try to check this today. There are two possibilites: 1) I did something wrong related with @kwdef, 2) I reached a bug that was fixed in a new Julia version. In case 1, I can manage the backward compatibility. In case 2, I need to rewrite a lot of things to avoid using @kwdef.

Off-topic: Btw, just wondering, is there really a significant number of Julia users using 1.0 even knowing that it is not supported anymore?

ronisbr avatar Sep 08 '22 12:09 ronisbr

is there really a significant number of Julia users using 1.0 even knowing that it is not supported anymore?

In corporate environments certification of any new version of a software is a long process. I at least know several customers that are stuck at some older Julia version (not all on 1.0, but e.g. 1.2). But I agree that we should prompt them to use at least Julia 1.6 😄, that it why I have written that if it is super hard to support 1.0 I think we can drop it.

bkamins avatar Sep 08 '22 13:09 bkamins

Dropping support for Julia 1.0 is a significant change. Let us discuss it.

1.0 (1.3) is no longer the last LTS, I doubt it's a significant change

Moelf avatar Sep 08 '22 18:09 Moelf

Well, I think I can re-add the support to Julia 1.0 in PrettyTables (more or less easy) and in StringManipulation (not that easy). However, in my tests, the performance using Julia 1.0 and the current state is BAD. It took almost 4s to print the first table here. The precompilation statements were the problem. For some reason, the precompilation does not work properly in 1.0. Isn't this a bad thing for DataFrames and PrettyTables?

ronisbr avatar Sep 08 '22 19:09 ronisbr

1.0 (1.3) is no longer the last LTS, I doubt it's a significant change

This is a fair point, but not all Julia users share it. If the cost of keeping support of Julia 1.0 is not high the plan is to keep this support.

I think it is quite likely that we will move to 1.6 as a lowest supported version in DataFrames.jl but we do not want to make this decision without consideration. Let us wait for the feedback in https://discourse.julialang.org/t/can-we-require-at-least-julia-1-6-in-dataframes-jl/86959.

(as a side note - if we drop 1.0 support then we should re-factor internals to use Julia 1.6 features that we do not use now)

bkamins avatar Sep 08 '22 19:09 bkamins

Isn't this a bad thing for DataFrames and PrettyTables?

It is, but the issue is that if something stops working for someone it is even worse. Let us wait a bit for the feedback in https://discourse.julialang.org/t/can-we-require-at-least-julia-1-6-in-dataframes-jl/86959.

I am leaning towards requiring Julia 1.6 in DataFrames.jl but I do not want to make a hasty decision here.

bkamins avatar Sep 08 '22 19:09 bkamins

Well, I think I can re-add the support to Julia 1.0 in PrettyTables

I don't personally need older than Julia 1.6 for any package, but if it's easy(er) to support Julia 1.3, then it might be helpful for some for Cxx.jl... That's the only package I can think of needing older than 1.6. [Julia snap no longer installs 1.0, now at least LTS.]

PallHaraldsson avatar Sep 08 '22 23:09 PallHaraldsson

Just a note: Pluto uses a completely different mechanism to show DataFrames. When a DataFrame is returned, it does not call the usual show, but rather a custom one. I thought Pluto was just heavily customizing the table using CSS, but no. That's why nothing in this PR can affect how DataFrames is displayed in Pluto.

On the other hand, if I use something like:

begin
    buf = IOBuffer()
    show(buf, MIME("text/html"), df)
    HTML(String(take!(buf))
end

Everything works and the output seems very similar to the one in Jupyter.

P.S.: I really do not like this output overload. In my opinion, it can confuse users reading tutorials written in Jupyter, for example. Furthermore, the output customization using show is a little bit more complicated.

ronisbr avatar Sep 09 '22 13:09 ronisbr

Output in Pluto:

Captura de Tela 2022-09-09 às 10 16 33

ronisbr avatar Sep 09 '22 13:09 ronisbr

Conclusion of the Julia lower bound discussion:

  1. We can require Julia 1.6 in DataFrames.jl 1.4
  2. Users of older Julia versions can use DataFrames.jl 1.3.x; I will do patch releases to this branch if there would be such requests in the future from users. This should be acceptable for old-Julia users as most likely they are using old Julia with old versions of packages.
  3. Before DataFrames.jl 1.4 release I will make a PR to DataFrames.jl that will clean up source code from hacks allowing for support of old Julia versions and bump the required Julia version to 1.6 in Project.toml.

bkamins avatar Sep 10 '22 13:09 bkamins

Thanks for the update @bkamins! In this case, I do not have any more comments and modifications to make in this PR. I will wait for the review process.

ronisbr avatar Sep 10 '22 13:09 ronisbr

@nalimilan - could you please have a quick look at it before we merge. Thank you!

bkamins avatar Sep 12 '22 08:09 bkamins