polars
polars copied to clipboard
DataFrame & Series html repr in notebook (`_repr_html`)
Problem description
In my opinion output of notebook cells should be clear, brief and concise to keep a good overview and be able to compare the output of different cells. I can see some improvements here:
- size / height of polar Dataframe in html table
- no html repr for Series
Size / height of DataFrame in html repr
Problem:
- DataFrame takes up almost 40% more vertical space in polars compared to pandas
- all rows except first have some weird padding/margin at the top. Not even sure if this is a bug or design?
- the default number of table rows (20??) seems much to big and takes up 70/80% of screen making it impossible to compare to other cells. I know this can be changed with
pl.Config.xxx
but 20 does honestly not feel like a sensible default in my opinion.
Possible Solution:
- remove the top padding/margin?
- open for other suggestions
- set default max rows to 10
as an example a DataFrame of shape 10x10 with random values in polars/pandas
import numpy as np
import polars as pl
import pandas as pd
r = np.random.rand(10, 10)
pd.DataFrame(r)
pl.DataFrame(r)
No _repr_html
for Series
Problem:
- compared to the nice DataFrame html output the Series output look kind of ugly :/
- output is not as clean, concise and easy to grasp
Possible Solution:
- implement
_repr_html
for Series - maybe adjust heading to clearly differentiate Series and DataFrame with 1 column
- maybe also add Config option to display Series horizontally as a row
What do you think?? =)
I have noticed the same issue. JupyterLab on the left, VS Code on the right:
All rows are double-height. The root cause is this CSS style:
dataframe td {
white-space: pre;
}
which causes the embedded new lines (\n
) in the output to be rendered as a separate row:
"<tr>\n",
"<td>\n",
"2022-12-20 11:00:00.834\n",
"</td>\n",
"<td>\n",
"16779.509766\n",
"</td>\n",
"<td>\n",
"15\n",
"</td>\n",
"</tr>\n",
compare with the equivalent Pandas output which is much more compact:
" <tr>\n",
" <th>2022-12-20 11:00:00.834</th>\n",
" <td>16779.509766</td>\n",
" <td>15</td>\n",
" </tr>\n",
Similar issue in Jetbrains DataSpell due to the extra newlines, with newlines being rendered in the output: