polars
polars copied to clipboard
Remove or suppress surrounding quotes from text in <td> elements of _repr_html() output
Problem description
I first posted this issue in StackOverflow as suggested. It reads...
I have a Python Flask app that recently switched from using Pandas to Polars for some dataframe handling. The pertinent code is shown here:
data = { 'Text': ['Virginia Woolf, Mrs. Dalloway', 'College website corpus', 'Presidential inaugural speeches', 'Federalist Papers', 'British Novels', 'YOUR TEXT'],
'To Be Frequency': [28.3, 16.7, 31.8, 39.8, 31.4, results[1]] }
df = pd.from_dict(data)
# textresult = (df.sort_values(by=['To Be Frequency'], ascending=False)).style # old Pandas code
# See https://pola-rs.github.io/polars/py-polars/html/reference/config.html for complete list of Polars.Config settings
pd.Config.set_tbl_hide_column_data_types(True)
pd.Config.set_tbl_hide_dataframe_shape(True)
pd.Config.set_fmt_str_lengths(40)
pd.Config.set_tbl_width_chars(200)
textresult = df.sort( 'To Be Frequency' )._repr_html_( ) # convert the result to HTML because a simple string won't do
The _repr_html( )
function appears to surround my dataframe strings with double quotes, rendered as escaped " characters, so that I get HTML output in my Python Flask templates like the attached screen capture.
I would prefer to see the same but without the quotes surrounding my "Text" names.
I've applied a quick fix to my code by adding htmlresult = textresult.replace( """, "")
before htmlresult
is rendered by my template. However, it would be nice to be more selective with an option to suppress the creation of additional quotes in the _repr_html()
function.
I've already forked and cloned the Polars codebase where I added the following test to my local py-polars/tests/unit/test_df.py
:
def test_to_html_quoted(df: pl.DataFrame) -> None:
# check it does not panic/error, and returns a table with NO quotes around the <td> text
data = { 'Text with NO "quotes" please': [' Quotes? '],
'Just a Number': [420] }
df = pl.from_dict(data)
html = df._repr_html_()
assert '" Quotes? ' not in html
I also tested several simple modifications to py-polars/polars/_html.py
around line numbers 101 thru 172, but I don't know @HTMLFormatter or Polars well enough yet to come up with an elegant solution that works.
This is caused by HTMLFormatter.write_body
which calls PySeries::get_fmt
Coming from Pandas, I also find the quotes in Jupyter very noisy (and inconsistent with the plain text display). It'd be nice to have an Config
option to omit this.
Related: #10646, #10648
Hi! I would like to contribute to this one. The quotes personally bug me a bit and after reviewing the references by @gusutabopb I think I have enough understanding to resolve it.
My plan is basically:
- add a configuration called boolean
POLARS_FMT_STR_QUOTES
around the same areas wherePOLARS_FMT_STR_LEN
configuration is , defaulting tofalse
- edit
HTMLFormatter.write_body
to read in the config and pass it on toPyseries::get_fmt
- edit
Pyseries::get_fmt
did I miss anything?
I know that POLARS_VERBOSE goes by "0" or "1" not "true" or "false" so that's a precedent to follow.
I know that POLARS_VERBOSE goes by "0" or "1" not "true" or "false" so that's a precedent to follow.
Yep, that was exactly where I was working now!
I'm basing it off of set_tbl_column_data_type_inline, in the config.py file . The python API accepts a bool, and eventually turns it into a 0/1 as it sets the environment variable.