DataFrames.jl icon indicating copy to clipboard operation
DataFrames.jl copied to clipboard

Use PrettyTable.jl as HTML backend

Open ronisbr opened this issue 2 years ago • 10 comments

Hi @bkamins and @nalimilan,

I started to see what features I need to implement in PrettyTables.jl so that we can use it as the HTML backend in DataFrames.jl. I will take a look at the HTML tests in DataFrames.jl as a starting point. I decided to open this issue so that we can talk about how can we do this integration.

My first question is: is there any thing you want to change in the way DataFrames.jl is rendering to HTML right now?

ronisbr avatar Aug 08 '21 19:08 ronisbr

Thank you for working on this.

I think two things can be considered to be changed:

  • the default width/height of the output; I am not sure if it is PrettyTables.jl related (probably not), but people keep complaining that 80 columns in width is not enough;
  • more consistent with text/plain printing of summary information above the printed table;

bkamins avatar Aug 08 '21 21:08 bkamins

Hi @bkamins

  • the default width/height of the output; I am not sure if it is PrettyTables.jl related (probably not), but people keep complaining that 80 columns in width is not enough;

I think I did not understand what you meant about this. I just look at the DataFrames.jl output to HTML of a DataFrame with 3 rows and 100 columns and all of them were printed. There isn't any annotation or styling related to the width. Hence, this is 100% dependent on the system used to render this HTML code.

  • more consistent with text/plain printing of summary information above the printed table;

Here you are saying about the title right? Like the 3 rows × 100 columns DataFrames.jl prints on top the table.

In this case, PrettyTables.jl took another approach. DataFrames.jl add a div with this information printed. PrettyTables.jl uses the caption of the table. For example:

DataFrames.jl

<div class="data-frame"><p>1 rows × 2 columns</p><table class="data-frame"><thead><tr><th></th><th>a</th><th>b</th></tr><tr><th></th><th title="Float64">Float64</th><th title="Float64">Float64</th></tr></thead><tbody><tr><th>1</th><td>3.14159</td><td>2.71828</td></tr></tbody></table></div>

PrettyTables.jl

<table><caption style = "text-align: left;">1 rows × 2 columns</caption><thead><tr class = "header"><th>a</th><th>b</th></tr><tr class = "subheader headerLastRow"><th>Float64</th><th>Float64</th></tr></thead><tbody><tr><td>3.14159</td><td>2.71828</td></tr></tbody></table>

Any preference here?

ronisbr avatar Aug 08 '21 23:08 ronisbr

I think I did not understand what you meant about this.

Try rendering a 3 rows x 100 columns DataFrame in a Jupyter Notebook, and you will see the problem that is discussed here: https://dataframes.juliadata.org/stable/man/getting_started/#Installation in the first Note.

How would PrettyTables.jl handle this?

Any preference here?

I am not sure what is best. However I would like captions to be consistent with this style:

julia> using DataFrames

julia> df = DataFrame(a=1)
1×1 DataFrame
 Row │ a
     │ Int64
─────┼───────
   1 │     1

julia> @view df[1:1, 1:1]
1×1 SubDataFrame
 Row │ a
     │ Int64
─────┼───────
   1 │     1

julia> @view df[1, 1:1]
DataFrameRow
 Row │ a
     │ Int64
─────┼───────
   1 │     1

julia> groupby(df, :a)
GroupedDataFrame with 1 group based on key: a
First Group (1 row): a = 1
 Row │ a
     │ Int64
─────┼───────
   1 │     1

Also if there are rows/columns omitted also the information should be printed somewhere (preferably at the top as in HTML when things are displayed at the bottom they are often not noticed by the users).

bkamins avatar Aug 08 '21 23:08 bkamins

Try rendering a 3 rows x 100 columns DataFrame in a Jupyter Notebook, and you will see the problem that is discussed here: https://dataframes.juliadata.org/stable/man/getting_started/#Installation in the first Note.

How would PrettyTables.jl handle this?

Right now, PrettyTables.jl just prints the entire table since the HTML can handle large tables using scrolls. I know that we need to put a limit on how many rows/cols are rendered to avoid hanging on very large DataFrames.

Captura de Tela 2021-08-09 às 09 50 07

Thus, I have my first task: add an option to limit the number of rows or columns printed in HTML backend :)

I am not sure what is best. However I would like captions to be consistent with this style:

Yes, this should be easy.

Also if there are rows/columns omitted also the information should be printed somewhere (preferably at the top as in HTML when things are displayed at the bottom they are often not noticed by the users).

Really? I think we can do something at the bottom to keep consistency with the text printing. I will do some example and post here.

ronisbr avatar Aug 09 '21 12:08 ronisbr

I know that we need to put a limit on how many rows/cols are rendered to avoid hanging on very large DataFrames.

This is OK (i.e. to have a higher limit). Also this could be probably configurable somehow (so the issue of some global state comes back again)

I think we can do something at the bottom to keep consistency with the text printing.

This is what users reported. In REPL you tend to look at the bottom as you enter the next command. In Jupyter Notebook you might even not have the end of the output displayed (as for large tables you would need to scroll to see it)

bkamins avatar Aug 09 '21 12:08 bkamins

This is OK (i.e. to have a higher limit). Also this could be probably configurable somehow (so the issue of some global state comes back again)

Yes, but I think this is way easier because we just need to store two integers. The problem with the global configuration of pretty tables is that we have a lot of types, leading the many problems in precompilation. We can handle this using an ENV variable or a const integer array.

This is what users reported. In REPL you tend to look at the bottom as you enter the next command. In Jupyter Notebook you might even not have the end of the output displayed (as for large tables you would need to scroll to see it)

But I think we can use two divs to show the table (which will be scrollable) and the text right below it. I think it will work fine :) Anyway, if you don't like, it is very easy to move it to the top.

ronisbr avatar Aug 09 '21 13:08 ronisbr

Anyway, if you don't like, it is very easy to move it to the top.

I think it is best to have a working example and ask on Slack. The solution with two divs is smart (assuming it works :) - I am not a HTML/Jupyter Notebook expert).

bkamins avatar Aug 09 '21 13:08 bkamins

How about this:

Captura de Tela 2021-08-29 às 11 36 04

The table is limited to a small size and is scrollable (horizontally and vertically). Then we add information how many rows and columns are not display within this table because we need to select a limit to avoid hanging on very large DataFrames.

EDIT: This is only an example, of course I will add the number of rows and columns that are not displayed.

ronisbr avatar Aug 29 '21 14:08 ronisbr

I think it is OK except that on top we should also display what is the object that is displayed (like in text/plain).

The question is how would you:

  • control vertical size of what is displayed (I assume that horizontally all available space would be used up)
  • control actual number of rows/columns rendered (available for scrolling)

Thank you for working on this.

bkamins avatar Aug 29 '21 20:08 bkamins

I think it is OK except that on top we should also display what is the object that is displayed (like in text/plain).

Yes, no problem!

  • control vertical size of what is displayed (I assume that horizontally all available space would be used up)
  • control actual number of rows/columns rendered (available for scrolling)

In this example I just selected 400px for the entire table. I am not sure if we can have a CSS parameter to use the entire available space. I need to do some research.

ronisbr avatar Aug 29 '21 23:08 ronisbr

Closing this as this is handled in #3096

bkamins avatar Sep 13 '22 16:09 bkamins