DataFrames.jl
DataFrames.jl copied to clipboard
Use PrettyTable.jl as HTML backend
Hi @bkamins and @nalimilan,
I started to see what features I need to implement in PrettyTables.jl so that we can use it as the HTML backend in DataFrames.jl. I will take a look at the HTML tests in DataFrames.jl as a starting point. I decided to open this issue so that we can talk about how can we do this integration.
My first question is: is there any thing you want to change in the way DataFrames.jl is rendering to HTML right now?
Thank you for working on this.
I think two things can be considered to be changed:
- the default width/height of the output; I am not sure if it is PrettyTables.jl related (probably not), but people keep complaining that 80 columns in width is not enough;
- more consistent with text/plain printing of summary information above the printed table;
Hi @bkamins
- the default width/height of the output; I am not sure if it is PrettyTables.jl related (probably not), but people keep complaining that 80 columns in width is not enough;
I think I did not understand what you meant about this. I just look at the DataFrames.jl output to HTML of a DataFrame with 3 rows and 100 columns and all of them were printed. There isn't any annotation or styling related to the width. Hence, this is 100% dependent on the system used to render this HTML code.
- more consistent with text/plain printing of summary information above the printed table;
Here you are saying about the title right? Like the 3 rows × 100 columns
DataFrames.jl prints on top the table.
In this case, PrettyTables.jl took another approach. DataFrames.jl add a div
with this information printed. PrettyTables.jl uses the caption
of the table
. For example:
DataFrames.jl
<div class="data-frame"><p>1 rows × 2 columns</p><table class="data-frame"><thead><tr><th></th><th>a</th><th>b</th></tr><tr><th></th><th title="Float64">Float64</th><th title="Float64">Float64</th></tr></thead><tbody><tr><th>1</th><td>3.14159</td><td>2.71828</td></tr></tbody></table></div>
PrettyTables.jl
<table><caption style = "text-align: left;">1 rows × 2 columns</caption><thead><tr class = "header"><th>a</th><th>b</th></tr><tr class = "subheader headerLastRow"><th>Float64</th><th>Float64</th></tr></thead><tbody><tr><td>3.14159</td><td>2.71828</td></tr></tbody></table>
Any preference here?
I think I did not understand what you meant about this.
Try rendering a 3 rows x 100 columns DataFrame
in a Jupyter Notebook, and you will see the problem that is discussed here: https://dataframes.juliadata.org/stable/man/getting_started/#Installation in the first Note.
How would PrettyTables.jl handle this?
Any preference here?
I am not sure what is best. However I would like captions to be consistent with this style:
julia> using DataFrames
julia> df = DataFrame(a=1)
1×1 DataFrame
Row │ a
│ Int64
─────┼───────
1 │ 1
julia> @view df[1:1, 1:1]
1×1 SubDataFrame
Row │ a
│ Int64
─────┼───────
1 │ 1
julia> @view df[1, 1:1]
DataFrameRow
Row │ a
│ Int64
─────┼───────
1 │ 1
julia> groupby(df, :a)
GroupedDataFrame with 1 group based on key: a
First Group (1 row): a = 1
Row │ a
│ Int64
─────┼───────
1 │ 1
Also if there are rows/columns omitted also the information should be printed somewhere (preferably at the top as in HTML when things are displayed at the bottom they are often not noticed by the users).
Try rendering a 3 rows x 100 columns
DataFrame
in a Jupyter Notebook, and you will see the problem that is discussed here: https://dataframes.juliadata.org/stable/man/getting_started/#Installation in the first Note.How would PrettyTables.jl handle this?
Right now, PrettyTables.jl just prints the entire table since the HTML can handle large tables using scrolls. I know that we need to put a limit on how many rows/cols are rendered to avoid hanging on very large DataFrames.
Thus, I have my first task: add an option to limit the number of rows or columns printed in HTML backend :)
I am not sure what is best. However I would like captions to be consistent with this style:
Yes, this should be easy.
Also if there are rows/columns omitted also the information should be printed somewhere (preferably at the top as in HTML when things are displayed at the bottom they are often not noticed by the users).
Really? I think we can do something at the bottom to keep consistency with the text printing. I will do some example and post here.
I know that we need to put a limit on how many rows/cols are rendered to avoid hanging on very large DataFrames.
This is OK (i.e. to have a higher limit). Also this could be probably configurable somehow (so the issue of some global state comes back again)
I think we can do something at the bottom to keep consistency with the text printing.
This is what users reported. In REPL you tend to look at the bottom as you enter the next command. In Jupyter Notebook you might even not have the end of the output displayed (as for large tables you would need to scroll to see it)
This is OK (i.e. to have a higher limit). Also this could be probably configurable somehow (so the issue of some global state comes back again)
Yes, but I think this is way easier because we just need to store two integers. The problem with the global configuration of pretty tables is that we have a lot of types, leading the many problems in precompilation. We can handle this using an ENV variable or a const
integer array.
This is what users reported. In REPL you tend to look at the bottom as you enter the next command. In Jupyter Notebook you might even not have the end of the output displayed (as for large tables you would need to scroll to see it)
But I think we can use two div
s to show the table (which will be scrollable) and the text right below it. I think it will work fine :) Anyway, if you don't like, it is very easy to move it to the top.
Anyway, if you don't like, it is very easy to move it to the top.
I think it is best to have a working example and ask on Slack. The solution with two divs is smart (assuming it works :) - I am not a HTML/Jupyter Notebook expert).
How about this:
The table is limited to a small size and is scrollable (horizontally and vertically). Then we add information how many rows and columns are not display within this table because we need to select a limit to avoid hanging on very large DataFrames.
EDIT: This is only an example, of course I will add the number of rows and columns that are not displayed.
I think it is OK except that on top we should also display what is the object that is displayed (like in text/plain).
The question is how would you:
- control vertical size of what is displayed (I assume that horizontally all available space would be used up)
- control actual number of rows/columns rendered (available for scrolling)
Thank you for working on this.
I think it is OK except that on top we should also display what is the object that is displayed (like in text/plain).
Yes, no problem!
- control vertical size of what is displayed (I assume that horizontally all available space would be used up)
- control actual number of rows/columns rendered (available for scrolling)
In this example I just selected 400px for the entire table. I am not sure if we can have a CSS parameter to use the entire available space. I need to do some research.
Closing this as this is handled in #3096