polars icon indicating copy to clipboard operation
polars copied to clipboard

DataFrame & Series html repr in notebook (`_repr_html`)

Open Julian-J-S opened this issue 2 years ago • 2 comments

Problem description

In my opinion output of notebook cells should be clear, brief and concise to keep a good overview and be able to compare the output of different cells. I can see some improvements here:

  • size / height of polar Dataframe in html table
  • no html repr for Series

Size / height of DataFrame in html repr

Problem:

  • DataFrame takes up almost 40% more vertical space in polars compared to pandas
  • all rows except first have some weird padding/margin at the top. Not even sure if this is a bug or design?
  • the default number of table rows (20??) seems much to big and takes up 70/80% of screen making it impossible to compare to other cells. I know this can be changed with pl.Config.xxx but 20 does honestly not feel like a sensible default in my opinion.

Possible Solution:

  • remove the top padding/margin?
  • open for other suggestions
  • set default max rows to 10

as an example a DataFrame of shape 10x10 with random values in polars/pandas

import numpy as np
import polars as pl
import pandas as pd
r = np.random.rand(10, 10)
pd.DataFrame(r)
pl.DataFrame(r)

polars_pandas_repr_height

No _repr_html for Series

Problem:

  • compared to the nice DataFrame html output the Series output look kind of ugly :/
  • output is not as clean, concise and easy to grasp

Possible Solution:

  • implement _repr_html for Series
  • maybe adjust heading to clearly differentiate Series and DataFrame with 1 column
  • maybe also add Config option to display Series horizontally as a row

polars_pandas_repr_series_html

What do you think?? =)

Julian-J-S avatar Nov 27 '22 14:11 Julian-J-S

I have noticed the same issue. JupyterLab on the left, VS Code on the right:

polars_html

All rows are double-height. The root cause is this CSS style:

dataframe td {
    white-space: pre;
}

which causes the embedded new lines (\n) in the output to be rendered as a separate row:

       "<tr>\n",
       "<td>\n",
       "2022-12-20 11:00:00.834\n",
       "</td>\n",
       "<td>\n",
       "16779.509766\n",
       "</td>\n",
       "<td>\n",
       "15\n",
       "</td>\n",
       "</tr>\n",

compare with the equivalent Pandas output which is much more compact:

       "    <tr>\n",
       "      <th>2022-12-20 11:00:00.834</th>\n",
       "      <td>16779.509766</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",

2-5 avatar Jan 03 '23 21:01 2-5

Similar issue in Jetbrains DataSpell due to the extra newlines, with newlines being rendered in the output:

image

jamesbeilby avatar Jan 05 '23 20:01 jamesbeilby